UniMorph

The Universal Morphology (UniMorph) project is a collaborative effort to improve how NLP handles complex morphology in the world’s languages. The goal of UniMorph is to annotate morphological data in a universal schema that allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and by a rendering of its inflectional form in terms of a bundle of morphological features from our schema. The specification of the schema is described here and in Sylak-Glassman (2016).

UniMorph Events

Annotated Languages

The following 51 languages have been annotated according to the UniMorph schema. Missing parts of speech will be filled in soon.

Language ISO-639-3 Forms Paradigms Nouns Verbs Adjectives Source License
Albanian sqi 33483 589
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Arabic ara 140003 4134
  • 2016 Shared Task Splits:yes
  • 2017 Shared Task Splits:yes
Armenian hye 338461 7033
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Basque eus 11889 26
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Bengali ben 4443 136
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Bulgarian bul 55730 2468
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Catalan cat 81576 1547
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Central Kurdish ckb 22990 274
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Czech ces 134527 5125
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Danish dan 25503 3193
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Dutch nld 55467 4993
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
English eng 115523 22765
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Estonian est 38215 886
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Faroese fao 45474 3077
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Finnish fin 2490377 57642
  • 2016 Shared Task Splits:yes
  • 2017 Shared Task Splits:yes
French fra 367732 7535
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Georgian kat 74412 3782
  • 2016 Shared Task Splits:yes
  • 2017 Shared Task Splits:yes
German deu 179339 15060
  • 2016 Shared Task Splits:yes
  • 2017 Shared Task Splits:yes
Haida hai 7040 41
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Hebrew heb 13818 510
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Hindi hin 54438 258
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Hungarian hun 490394 13989
  • 2016 Shared Task Splits:yes
  • 2017 Shared Task Splits:yes
Icelandic isl 76915 4775
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Irish gle 107298 7464
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Italian ita 509574 10009
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Khaling klr 156097 591
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Latin lat 509182 17214
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Latvian lav 136998 7548
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Lithuanian lit 34130 1458
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Lower Sorbian dsb 20121 994
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Macedonian mkd 168057 10313
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Navajo nav 12354 674
  • 2016 Shared Task Splits:yes
  • 2017 Shared Task Splits:yes
Northern Kurdish kmr 216370 15083
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Northern Sami sme 62677 2103
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Norwegian Bokmål nob 19238 5527
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Norwegian Nynorsk nno 15319 4689
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Persian fas 37128 273
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Polish pol 201024 10185
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Portuguese por 303996 4001
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Quechua que 180004 1006
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Romanian ron 80266 4405
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Russian rus 473481 28068
  • 2016 Shared Task Splits:yes
  • 2017 Shared Task Splits:yes
Scottish Gaelic gla 781 73
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Slovak slk 14796 1046
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Slovenian slv 60110 2535
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Spanish spa 382955 5460
  • 2016 Shared Task Splits:yes
  • 2017 Shared Task Splits:yes
Swedish swe 78411 10553
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Turkish tur 275460 3579
  • 2016 Shared Task Splits:yes
  • 2017 Shared Task Splits:yes
Ukrainian ukr 20904 1493
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Urdu urd 12572 182
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes
Welsh cym 10641 183
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:yes

Coming Attractions!

The following languages are in the process of being annotated according to the UniMorph specification.

Language ISO-639-3 Forms Paradigms Nouns Verbs Adjectives Source License
!Xóõ nmn 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Adyghe ady 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Afrikaans afr 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Ancient Greek grc 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Aragonese arg 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Aramaic arc 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Asturian ast 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Azerbaijani aze 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Bashkir bak 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Belarusian bel 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Breton bre 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Buriat bua 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Chechen che 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Church Slavic chu 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Classical Armenian xcl 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Classical Nahuatl nci 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Classical Syriac syc 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Cornish cor 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Corsican cos 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Crimean Tatar crh 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Egyptian Arabic arz 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Friulian fur 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Gagauz gag 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Galician glg 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Gothic got 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Hausa hau 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Hittite hit 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Ingrian izh 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Inuktitut iku 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Istriot ist 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Japanese jpn 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Jèrriais nrf 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Kabardian kbd 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Kalaallisut kal 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Kannada kan 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Karelian krl 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Kashubian csb 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Kazakh kaz 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Khakas kjh 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Kirghiz kir 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Korean kor 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Ladin lld 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Ladino lad 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Limburgan lim 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Liv liv 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Low German nds 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Luxembourgish ltz 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Macedo-Romanian rup 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Malagasy mlg 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Malay msa 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Malayalam mal 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Maltese mlt 0 0
  • 2016 Shared Task Splits:yes
  • 2017 Shared Task Splits:no
Mandarin Chinese cmn 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Manx glv 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Mapudungun arn 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Middle Dutch dum 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Middle French frm 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Middle High German gmh 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Middle Low German gml 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Mirandese mwl 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Modern Greek ell 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Neapolitan nap 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Northern Frisian frr 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Northern Tiwa twf 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Occitan oci 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Ojibwa oji 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Old Dutch odt 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Old English ang 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Old French fro 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Old Irish sga 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Old Norse non 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Old Portuguese pto 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Old Provençal pro 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Old Saxon osx 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Panjabi pan 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Pushto pus 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Romansh roh 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Romany rom 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Sanskrit san 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Sardinian srd 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Saterfriesisch stq 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Serbian srp 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Sicilian scn 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Skolt Sami sms 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Swahili swa 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Swiss German gsw 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Tajik tgk 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Tatar tat 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Telugu tel 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Tibetan bod 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Tswana tsn 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Turkmen tuk 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Uighur uig 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Uzbek uzb 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Venetian vec 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Votic vot 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Võro vro 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Walloon wln 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Western Frisian fry 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Wymysorys wym 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Yiddish yid 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Yucatec Maya yua 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Zulu zul 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no