Les prénoms et les patronymes dans les ressources dictionnairiques pour le traitement automatique du polonais par NooJ

Krzysztof Bogacki

Uniwersytet Warszawski
https://orcid.org/0000-0003-2755-4276

Agnieszka Dryjańska

Agnieszka DRYJAŃSKA - Uniwersytet Warszawski
https://orcid.org/0000-0003-1649-8408


Abstract

This paper reports on a study whose purpose was to provide researchers specializing in the automatic treatment of natural languages with linguistic resources dedicated to Polish, namely dictionaries and local grammars. Firstly, a morphological dictionary of first names and surnames in NooJ format is presented. The corpus for the dictionary, made up of texts collected from several sources published on the Internet, contains more than 466,000 headwords (7 586 first names and 458 244 surnames). Seeking to reduce the size of the dictionary, we propose a modular approach for the construction of local grammars. It requires, however, the creation of more than 40 local grammars for surnames and almost double for first names. The dictionary recognizes altogether about 33MB of forms. As the solution based on a list of first names and surnames is time- and disc space-consuming, we introduce another approach – based on local grammars only. In the final part of the paper, we discuss the advantages and disadvantages of both solutions, as well as semantic and grammatical ambiguities that cannot be overcome in both approaches. Secondly, we discuss the reasons for the choice of this part of the lexicon, and next, having given a brief overview of the properties that distinguish proper nouns from the common names, we describe these properties that have a direct impact on the forms of surnames in Polish and constitute the main sources of opposition among them. In addition to the grammatical categories (case, gender and number) affecting surnames’ forms, we also point out their origin (Slavic, Latin, Greek, biblical etc.). As for the observance of the usage rules of Polish surnames, very strict or more flexible, we have adopted a liberal approach that does not exclude certain forms, although they can be considered erroneous by purists.

Keywords:

prénoms, automatic treatment of natural languages, surnames, first names

Awramiuk E., 1999, Systemowość polskiej homonimii międzyparadygmatycznej, Białystok, Wydawnictwo Uniwersytetu w Białymstoku.

Buttler D., Branicka T. & Tokarski J. red., 1984, Słownik polskich form homonimicznych, Wrocław, Ossolineum.

Constanza J., 2016, Nom propre et nomination : Etude d’un cas : la nomination des hommes politiques dans la presse écrite française, thèse de doctorat, Tours.

Daille B. & Morin E., 2000, « Reconnaissance automatique des noms propres de la langue écrite: les récentes réalisations », in : Traitement automatique des langues, Vol. 41, no 3, pp. 601–621.

Grevisse M., 1964, Le bon usage – Grammaire française, Louvain, Duculot, Hatier.

Kleiber G., 1981, Problèmes de référence : descriptions définies et noms propres, Metz, Centre d’Analyse Syntaxique.

Przepiórkowski A., Woliński M., 2003, « A Flexemic Tagset for Polish », in: Proceedings of the Workshop on Morphological Processing of Slavic Languages, EACL 2003, pp. 33–40.

Quemada B., 1967, Les dictionnaires du français moderne 1539–1863 – Etude sur leur histoire, leurs types et leurs méthodes, Paris, Didier.

Rymut K., 2003–2005, Słownik nazwisk używanych w Polsce na początku XXI wieku, Kraków–Warszawa, GenPol Tomasz Nitsch.

Rzetelska-Feleszko E., éd., 2005, Polskie nazwy własne, Kraków, Instytut Języka Polskiego Polskiej Akademii Nauk.

Silberztein M., 2015, La formalisation des langues : l’approche de NooJ, London, ISTE Editions.

Togeby K., 1982, Grammaire française – Vol. I : le Nom, Copenhague, Akademisk Forlag.

Vaxélaire J.-L., 2005, Les noms propres. Une analyse lexicologique et historique, Paris, Honoré Champion.

Woliński M., 2003, « System znaczników morfosyntaktycznych w korpusie IPI PAN», in : Polonica, XXII–XXIII, pp. 39–55.

Zeboudj K., 2011, Les dénominations monoréférentielles dans un guide touristique sur l’Algérie : approches linguistique et traductologique, thèse de doctorat, Université de la Sorbonne Nouvelle – Paris III.

http://nlp.actaforte.pl:8080/Nomina/Nazwiska

http://clarin-pl.eu/en/uslugi/

http://horajec.republika.pl/fakt28.html

http://nazwiska-polskie.pl/

http://stankiewicze.com/index.php?kat=44

http://szlachtarp.pl/lista-nazwisk.html

http://www.bip19.098.pl/index.php

http://www.futrega.org/etc/nazwiska.zip

http://www.herby.com.pl

http://www.jezykowedylematy.pl/2017/03/odmiana-nazwisk-dwuczlonowychsaryusz-wolski/

http://www.listaipn.pl/

http://www.forum-norwegia.pl/viewtopic.php?t=26370&start=108

https://pl.wikipedia.org/wiki/Kategoria:Alfabetyczna lista imion

https://sjp.pwn.pl/zasady/;629611

https://polandtimes.wordpress.com/2016/07/05/planowany-efekt-dominafinasowego-trzech-krajow

https://nk.pl/szkola/3541/forum/44

https://eurosport.interia.pl/justyna-kowalczyk/news-justyna-kowalczykujawnia-przezylam-zalamanie-nerwowe,nId,1436878

https://forum.trojmiasto.pl/WYDAJE-MI-SIE-RYWALIZACJA-PONIEWAZt166780,1,170.html


Published
2019-12-30


Bogacki, K. and Dryjańska, A. (2019) “Les prénoms et les patronymes dans les ressources dictionnairiques pour le traitement automatique du polonais par NooJ”, Białostockie Archiwum Językowe, (19), pp. 47–65. doi: 10.15290/baj.2019.19.03.

Krzysztof Bogacki 
Uniwersytet Warszawski https://orcid.org/0000-0003-2755-4276
Agnieszka Dryjańska 
Agnieszka DRYJAŃSKA - Uniwersytet Warszawski https://orcid.org/0000-0003-1649-8408