UniMorph

The Universal Morphology (UniMorph) project is a collaborative effort to improve how NLP handles complex morphology in the world’s languages. The goal of UniMorph is to annotate morphological data in a universal schema that allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and by a rendering of its inflectional form in terms of a bundle of morphological features from our schema. The specification of the schema is described here and in Sylak-Glassman (2016).

UniMorph Events

Annotated Languages

The following 111 languages have been annotated according to the UniMorph schema. Missing parts of speech will be filled in soon.

Language ISO 639-3 Forms Paradigms Nouns Verbs Adjectives Source License
Adyghe ady 20475 1666 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Albanian sqi 33483 589 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Ancient Greek grc 41593 2431 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Arabic ara 140003 4134 Creative Commons License
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: true
Armenian hye 338461 7033 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Asturian ast 29797 436 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Azerbaijani aze 8004 340 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Bashkir bak 12168 1084 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Basque eus 11889 26
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Belarusian bel 16113 1027 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Bengali ben 4443 136 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Breton bre 2294 44 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Bulgarian bul 55730 2468 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Catalan cat 81576 1547 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Central Kurdish ckb 22990 274 LGPLLR
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Classical Armenian xcl 97181 4300 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Classical Syriac syc 3652 160 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: true
Cornish cor 469 9 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Crimean Tatar crh 7514 1230 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Czech ces 134527 5125 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Danish dan 25503 3193 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Dutch nld 55467 4993 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
English eng 115523 22765 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Estonian est 38215 886 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Faroese fao 45474 3077 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Finnish fin 2490377 57642 Creative Commons License
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
French fra 367732 7535 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Friulian fur 8071 168 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Galician gal 36801 486 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Georgian kat 74412 3782 Creative Commons License
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
German deu 179339 15060 Creative Commons License
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Gothic got Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Greenlandic kal 368 23 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology:
  • Templatic:
Haida hai 7040 41
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Hebrew heb 13818 510 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: true
Hindi hin 54438 258 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Hungarian hun 490394 13989 Creative Commons License
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Icelandic isl 76915 4775 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Ingrian izh 1099 50 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Irish gle 107298 7464 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Italian ita 509574 10009 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Kabardian kbd 3092 250 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Kannada kan 6402 159 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Karelian krl 682 20 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Kashubian csb 509 37 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Kazakh kaz 357 26 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Khakas kjh 1200 75 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Khaling klr 156097 591 LGPLLR
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Ladin lld 180 7656 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Latin lat 509182 17214 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Latvian lav 136998 7548 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Lithuanian lit 34130 1458 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Livonian liv 3987 203 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Livvi olo
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Low German nds Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Lower Sorbian dsb 20121 994 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Ludian lud
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Macedonian mkd 168057 10313 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Maltese mlt 3584 112 Creative Commons License
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Manx glv 14 1 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Mapudungun arn 783 26 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Middle French frm 36970 603 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Middle High German gmh 708 29 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Middle Low German gml 1513 52 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Modern Greek ell 199763 11906 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Murrinhpatha mwf 1110 29 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology:
  • Templatic:
Navajo nav 12354 674 Creative Commons License
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Neapolitan nap 1808 40 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Norman xno 280 5 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology:
  • Templatic:
North Frisian frr 3204 51 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Northern Kurdish kmr 216370 15083 LGPLLR
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Northern Sami sme 62677 2103 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Norwegian Bokmål nob 19238 5527 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Norwegian Nynorsk nno 15319 4689 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Occitan oci 8316 174 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old Church Slavonic chu 4148 152 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old English ang 42425 1867 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old French fro 123374 1700 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old Irish sga 1089 49 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old Saxon osx 22287 863 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Pashto pus 6945 395 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Persian fas 37128 273 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Polish pol 201024 10185 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Portuguese por 303996 4001 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Quechua que 180004 1006 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Romanian ron 80266 4405 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Russian rus 473481 28068 Creative Commons License
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Sanskrit san 33847 917 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Scottish Gaelic gla 781 73 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Serbo Croatian hbs 840799 24419 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology:
  • Templatic:
Slovak slk 14796 1046 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Slovenian slv 60110 2535 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Spanish spa 382955 5460 Creative Commons License
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Swahili swc 10092 100 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Swedish swe 78411 10553 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Tajik tgk 77 75 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Tatar tat 7832 1283 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Telugu tel 1548 127 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Tibetan bod 353 65 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Turkish tur 275460 3579 Creative Commons License
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Turkmen tuk 810 68 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Ukrainian ukr 20904 1493 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Urdu urd 12572 182 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Uzbek uzb 1260 15 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Venetian vec 18227 368 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Veps (Vepsian) vep
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Votic vot 1430 55 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: true
Welsh cym 10641 183 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
West Frisian fry 1429 85 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Yiddish yid 7986 803 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Zulu zul 49119 566 Creative Commons License
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false

Coming Attractions!

The following languages are in the process of being annotated according to the UniMorph specification.

Language ISO 639-3
!Xóõ nmn
  • Typology: fusional
  • Templatic: false
  • Info: wikipedia
  • Type: living
Afrikaans afr
  • Typology: fusional
  • Templatic: false
  • Info: wikipedia
  • Type: living
Aragonese arg
  • Typology: fusional
  • Templatic: false
  • Info: wikipedia
  • Type: living
Aramaic arc
  • Typology: fusional
  • Templatic: true
  • Info: wikipedia
  • Type: ancient
Buriat bua
Chechen che
Classical Nahuatl nci
Corsican cos
Egyptian Arabic arz
Gagauz gag
Hausa hau
Hittite hit
Inuktitut iku
Istriot ist
Japanese jpn
Jèrriais nrf
Kalaallisut kal
Kirghiz kir
Korean kor
Ladino lad
Limburgan lim
Luxembourgish ltz
Macedo-Romanian rup
Malagasy mlg
Malay msa
Malayalam mal
Mandarin Chinese cmn
Middle Dutch dum
Mirandese mwl
Northern Tiwa twf
Ojibwa oji
Old Dutch odt
Old Norse non
Old Portuguese pto
Old Provençal pro
Panjabi pan
Romansh roh
Romany rom
Sardinian srd
Saterfriesisch stq
Serbian srp
Sicilian scn
Skolt Sami sms
Swiss German gsw
Tswana tsn
Uighur uig
Võro vro
Walloon wln
Wymysorys wym
Yucatec Maya yua