Putting languages into perspective: A comprehensive database of English words and their Croatian equivalents

Jasmina Jelčić Čolakovac

University of Rijeka, Faculty of Maritime Studies, Croatia

Jasmina Jelčić Čolakovac received her MA degree in English language and History in 2011 at the Faculty of Arts and Sciences in Rijeka. She obtained her PhD degree in Applied Linguistics in 2017 at the University of Ljubljana. Her research interests include English loanwords in Croatian and the processing of metaphoric expressions in bilingual speakers. She has been part of the research team in the newly established Laboratory for Language, Cognition & Neuroscience (LaconLab) since 2020.


https://orcid.org/0000-0002-1241-1283

Irena Bogunović

University of Rijeka, Faculty of Maritime Studies, Croatia

Irena Bogunović received her MA degree in English and Croatian languages in 2008 at the Faculty of Arts and Sciences in Rijeka. She obtained her PhD degree in Cognitive Sciences in 2017 at the University of Zagreb. Her research interests include English loanwords in Croatian and their neurocognitive processing by bilingual speakers. She has been acting as the head of the newly established Laboratory for Language, Cognition & Neuroscience (LaconLab) since 2020.


https://orcid.org/0000-0002-2956-7014


Аннотация

Numerous studies have addressed the issue of English words in the context of their adaptation, but there still exists the need for a systematic perspective on English words in terms of their number and frequency of appearance. This article will outline the procedure behind the compilation process of unadapted English words in the Croatian language with a comprehensive description of the final product – an open-access database of single- (SWE) and multi-word (MWE) English expressions extracted from Croatian web corpora (ENGRI and hrWaC) by means of computational-linguistic tools and manual extraction. The final version of the database contains 2,982 English words in their unadapted form (e.g. blockbuster), and 18 words which appear with English orthographic properties in combination with Croatian inflectional affixes (e.g. downloadati). Each SWE and MWE entry in the database is accompanied with frequencies of appearance in both corpora as well as its Croatian equivalent where available (29.58% of all entries are listed without an equivalent). The database serves as the first systematic representation of English words in Croatian and provides an indispensable tool for further research into the phenomenon while at the same time opening the door to a new line of research – cognitive processing of English words in Croatian.

Ключевые слова:

English words in Croation, language borrowing, corpus search, database compilation, anglicisms

[Dataset] Bogunović, I., Jelčić Čolakovac, J. & Borucinsky, M. (2022). The database of English words and their Croatian equivalents. figshare. DOI: https://doi.org/10.6084/m9.figshare.20014712.v1

[Dataset] Bogunović, I. & Kučić, M. (2021). Korpus hrvatskih novinskih portala ENGRI [Corpus of Croatian news portals ENGRI]. https://urn.nsk.hr/urn:nbn:hr:187:920822.

[Dataset] Bogunović, I., Kučić, M., Ljubešić, N. & Erjavec, T. (2021). Corpus of Croatian news portals ENGRI. Slovenian language resource repository CLARIN.SI. http://hdl.handle.net/11356/1416

[Dataset] Bogunović, I. & Kučić, M. (2022). The database of English words in Croatian.xlsx. figshare. DOI: https://doi.org/10.6084/m9.figshare.20014364.v1

Brdar, I. (2010). Engleske riječi u jeziku hrvatskih medija [English words in the language of Croatian media]. Lahor 10, 174–189.

Alex, B. (2005). An unsupervised system for identifying English inclusions in German text. In C. Callison-Burch & S. Wan (Eds.), 43. Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 133–138). The University of Michigan. https://dl.acm.org/

doi/10.5555/1628960.1628985

Alvarez-Mellado, E. (2020). An annotated corpus of emerging Anglicisms in Spanish newspaper headlines. In Proceedings of The 4th Workshop on Computational Approaches to Code Switching (pp. 1–8). European Language Resources Association. https://arxiv.org/abs/2004.02929

Andersen, G. (2012). Semi-automatic approaches to Anglicism detection in Norwegian corpus dana. In C. Furiassi, V. Pulcini & F. R. González (Eds.), The anglicization of European lexis (pp. 111–130). John Benjamins. https://doi.org/10.1075/z.174.09

Bogunović, I. & Kučić M. The database of English words in Croatian. Under review.

Bujas, Ž. (2019). Novi englesko-hrvatski rječnik [The new English-Croatian dictionary]. Nakladni zavod Globus.

Crystal, D. (2003). English as a global language (2nd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511486999

Čepon, S. (2017). Anglicizmi v poslovni nomenklaturi turistinih podjetij v Sloveniji. Revija za ekonomske in poslovne vede 2, 35–49.

Ćoso, B. & Bogunović, I. (2017). Person perception and language: A case of English words in Croatian. Language & Communication, 53, 25–34.

https://doi.org/10.1016/j.langcom.2016.11.001

Drljača, B. (2006). Anglizmi u ekonomskome nazivlju hrvatskoga jezika i standardnojezična norma [Anglicisms in the economic terminology of the Croatian language and the standard language norm]. Fluminensia, 18(1), 65–85.

Drljača Margić, B. (2014). Contemporary English influence on Croatian: A university students’ perspective. In A. Koll-Stobbe & S. Knospe (Eds.), Language Contact Around the Globe (Proceedings of the LCTG3 Conference, pp. 73–92). Peter Lang.

Entlová, G. & Mala, E. (2020). The occurrence of anglicisms in the Czech and Slovak lexicons. Xlinguae, 13(2), 140–148.

https://doi.org/10.18355/XL.2020.13.02.11

Filipović, R. (1990). Anglicisms in Croatian or Serbian: Origin – development – meaning. Školska knjiga.

Furiassi, C. & Hofland, K. (2007). The retrieval of false anglicisms in newspaper texts. In R. Facchinetti (Ed.), Corpus Linguistics 25 Years On (pp. 347–363). Brill/Rodopi. https://doi.org/10.1163/9789401204347_020

Görlach, M. (Ed.). (2002). An Annotated Bibliography of European Anglicisms. Oxford University Press. https://doi.org/10.1515/9783484431027.15

Godwin-Jones, R. (2019). Contributing, creating, curating: Digital literacies for language learners. Language learning & technology, 19(3), 8–20. https://www.lltjournal.org/item/10125-44427/

Greenall, A. K. (2005). To translate or not to translate: Attitudes to English loanwords in Norwegian. In B. Preisler, A. Fabricius, H. Haberland, S. Kjærbeck & K. Risager (Eds), The consequences of mobility (pp. 212–226). Roskilde University.

Halonja, A. & Hudeček, L. (2014). Pokloni mi svoj selfie [Give me your selfie]. Hrvatski jezik, 2, 26–27.

Hudeček, L. & Mihaljević, M. (2005). Nacrt za višerazinsku kontrastivnu englesko-hrvatsku analizu [An outline of a multilevel contrastive Croatian-English analysis]. Rasprave Instituta za hrvatski jezik i jezikoslovlje, 31, 107–151. https://hrcak.srce.hr/9381

Jelčić Čolakovac, J. & Borucinsky, M. (2023). In the melting pot of web-crawled texts: The challenges of extracting English words and phrases from Croatian corpora. International Journal of Applied Linguistics, 34(1), 166–182. https://doi.org/10.1111/ijal.12485

Kavgić, A. (2013). Intended communicative effects of using borrowed English vocabulary from the point of view of the addressor: Corpus-based pragmatic analysis of a magazine column. Jezikoslovlje, 14(2–3), 487–499. https://hrcak.srce.hr/112204

Kay, G. (1995). English loanwords in Japanese. World Englishes, 14(1), 67–76. https://doi.org/10.1111/j.1467-971X.1995.tb00340.x

Kilgarriff, A., Rychlý, P., Smrž, P. & Tugwell, D. (2004). Itri-04-08 The Sketch Engine. Information Technology, pp. 105–116.

Kučić, M. (2021). Creating a web corpus using GO. In M. Koričić et al. (Eds.), 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO) (pp.1676–1678). Croatian Society for Information, Communication and Electronic Technology - MIPRO: Rijeka. DOI: https://doi.org/10.23919/MIPRO52101.2021.9597093

Luján García, C. (2017). Analysis of the presence of Anglicisms in a Spanish internet forum: some

terms from the fields of fashion, beauty, and leisure. Alicante Journal of English Studies, 30,

–305. https://doi.org/10.14198/raei.2017.30.10

Ljubešić, N. & Erjavec, T. (2011). HrWaC and slWac: compiling web corpora for Croatian and Slovene. In I. Habernal & V. Matoušek (Eds.), Text, speech and dialogue, lecture notes in computer science (pp. 395–402). Springer.

Ljubešić, N. & Klubička, F. (2016). {bs, hr, sr} wac-web corpora of Bosnian, Croatian and Serbian. In F. Bildhauer & R. Schäfer (Eds.), Proceedings of the 9th web as corpus workshop (WaC-9) (pp. 29–35). Association for Computational Linguistics. http://dx.doi.org/10.3115/v1/W14-0405

McKenzie, R. M. (2010). The social psychology of English as a global language: Attitudes, awareness and identity in the Japanese context. Springer. https://doi.org/10.1007/978-90-481-8566-5

Međeral, K. (2016). Jezične bakterije – pomagači ili štetočine u jezičnome organizmu? [Language bacteria – helpers or foes in the language organism?]. Hrvatski jezik, 3, 1–10. https://hrcak.srce.hr/171398

Mihaljević Djigunović, J. & Geld, R. (2003). English in Croatia today: Opportunities for incidental vocabulary acquisition. Studia Romanica et Anglica Zagrabiensia, 43, 335–352. https://hrcak.srce.hr/21021

Muhvić-Dimanovski, V. & Skelin Horvat, A. (2006). O riječima stranoga podrijetla i njihovu nazivlju [On words of foreign origin and their terminology]. Filologija, 44-47, 203–215. https://hrcak.srce.hr/22242

Muhvić-Dimanovski, V. & Skelin Horvat, A. (2008). Contests and nominations for new words: why are they interesting and what do they show. Suvremena lingvistika, 65(1), 1–26. https://hrcak.srce.hr/25183

Muhvić-Dimanovski, V., Skelin Horvat, A. & Hriberski, D. (2016). Rječnik neologizama u hrvatskome jeziku [The dictionary of neologisms in Croatian]. www.rjecnik.neologizam.ffzg.unizg.hr

Nikolić-Hoyt, A. (2005). Englesko-hrvatski jezično-kulturni dodiri [English and Croatian in language and cultural contacts]. In D. Stolac, N. Ivanetić & B. Pritchard (Eds.), Jezik u društvenoj interakciji (Zbornik radova sa savjetovanja održanoga 16. i 17. svibnja u Opatiji) (pp. 353–358).

Zagreb: Hrvatsko društvo za primijenjenu lingvistiku.

Patekar, J. (2019). Prihvatljivost prevedenica kao zamjena za anglizme [The acceptability of loan translations as substitutes for anglicisms]. Fluminensia, 31(2), 143–179. https://doi.org/10.31820/f.31.2.17

Pulcini, V., Furiassi, C. & Gonzales, F. R. (2012). The lexical influence of English on European languages: From words to phraseology. In V. Pulcini, C. Furiassi & F. R. Rodrigues (Eds.), Anglicization of European lexis (pp. 1–27). John Benjamins. https://doi.org/10.1075/z.174.03pul

Rüdiger, S. (2018). Mixed feelings: Attitudes towards English loanwords and their use in South Korea. Open Linguistics, 4, 184–198.

https://doi.org/10.1515/opli-2018-0010

Serigos, J. R. L. (2017). Applying corpus and computational methods to loanword research: new approaches to Anglicisms in Spanish. [Unpublished doctoral thesis]. University of Texas at Austin.

Tadić, M. (2022). European language equality: Report on the Croatian language. European Language Equality (ELE): Berlin. https://european-language-equality.eu/wp-content/uploads/2022/03/ELE___Deliverable_D1_7__Language_Report_Croatian_.pdf

Tadić, M., D. Brozović-Rončević & Kapetanović, A. (2012). Hrvatski jezik u digitalnom dobu [The Croatian language in the digital age]. Springer. https://doi.org/10.1007/978-3-642-30882-6_9

Zourou, K. (2012). On the attractiveness of social media for language learning: a look at the state of the art. Alsic. Apprentissage Des Langues et Systèmes d’Information et de Communication, 15(1). https://doi.org/10.4000/alsic.2436


Опубликован
2024-11-07


Jelčić Čolakovac, J. и Bogunović, I. (2024) «Putting languages into perspective: A comprehensive database of English words and their Croatian equivalents», Crossroads. A Journal of English Studies, (45), сс. 62–81. doi: 10.15290/CR.2024.45.2.04.

Jasmina Jelčić Čolakovac 
University of Rijeka, Faculty of Maritime Studies, Croatia

Jasmina Jelčić Čolakovac received her MA degree in English language and History in 2011 at the Faculty of Arts and Sciences in Rijeka. She obtained her PhD degree in Applied Linguistics in 2017 at the University of Ljubljana. Her research interests include English loanwords in Croatian and the processing of metaphoric expressions in bilingual speakers. She has been part of the research team in the newly established Laboratory for Language, Cognition & Neuroscience (LaconLab) since 2020.

https://orcid.org/0000-0002-1241-1283
Irena Bogunović 
University of Rijeka, Faculty of Maritime Studies, Croatia

Irena Bogunović received her MA degree in English and Croatian languages in 2008 at the Faculty of Arts and Sciences in Rijeka. She obtained her PhD degree in Cognitive Sciences in 2017 at the University of Zagreb. Her research interests include English loanwords in Croatian and their neurocognitive processing by bilingual speakers. She has been acting as the head of the newly established Laboratory for Language, Cognition & Neuroscience (LaconLab) since 2020.

https://orcid.org/0000-0002-2956-7014