UniMorph

The Universal Morphology (UniMorph) project is a collaborative effort to improve how NLP handles complex morphology in the world’s languages. The goal of UniMorph is to annotate morphological data in a universal schema that allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and by a rendering of its inflectional form in terms of a bundle of morphological features from our schema. The specification of the schema is described here and in Sylak-Glassman (2016).

UniMorph Events

Annotated Languages

The following 107 languages have been annotated according to the UniMorph schema. Missing parts of speech will be filled in soon.

Language ISO 639-3 Forms Paradigms Nouns Verbs Adjectives Source License
Adyghe ady
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Albanian sqi 33483 589
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Arabic ara 140003 4134
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: true
Armenian hye 338461 7033
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Asturian ast
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Azerbaijani aze
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Bashkir bak
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Basque eus 11889 26
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Belarusian bel
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Bengali ben 4443 136
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Breton bre
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Bulgarian bul 55730 2468
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Catalan cat 81576 1547
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Central Kurdish ckb 22990 274
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Classical Armenian xcl
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Classical Syriac syc
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: true
Cornish cor
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Crimean Tatar crh
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Czech ces 134527 5125
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Danish dan 25503 3193
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Dutch nld 55467 4993
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
English eng 115523 22765
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Estonian est 38215 886
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Faroese fao 45474 3077
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Finnish fin 2490377 57642
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
French fra 367732 7535
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Friulian fur
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Galician gal
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Georgian kat 74412 3782
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
German deu 179339 15060
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Gothic got
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Greenlandic kal
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology:
  • Templatic:
Haida hai 7040 41
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Hebrew heb 13818 510
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: true
Hindi hin 54438 258
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Hungarian hun 490394 13989
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Icelandic isl 76915 4775
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Ingrian izh
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Irish gle 107298 7464
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Irish ita 509574 10009
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Kabardian kbd
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Kannada kan
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Karelian krl
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Kashubian csb
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Kazakh kaz
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Khakas kjh
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Khaling klr 156097 591
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Ladin lld
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Latin lat 509182 17214
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Latvian lav 136998 7548
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Lithuanian lit 34130 1458
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Livonian liv
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Low German nds
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Lower Sorbian dsb 20121 994
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Macedonian mkd 168057 10313
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Maltese mlt
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Manx glv
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Mapudungun arn
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Middle French frm
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Middle High German gmh
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Middle Low German gml
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Modern Greek ell
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Murrinhpatha mwf
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology:
  • Templatic:
Navajo nav 12354 674
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Neapolitan nap
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Norman xno
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology:
  • Templatic:
North Frisian frr
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Northern Kurdish kmr 216370 15083
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Northern Sami sme 62677 2103
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Norwegian Bokmål nob 19238 5527
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Norwegian Nynorsk nno 15319 4689
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Occitan oci
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old Church Slavonic chu
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old English ang
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old French fro
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old Irish sga
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old Saxon osx
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Pashto pus
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Persian fas 37128 273
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Polish pol 201024 10185
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Portuguese por 303996 4001
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Quechua que 180004 1006
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Romanian ron 80266 4405
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Russian rus 473481 28068
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Sanskrit san
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Scottish Gaelic gla 781 73
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Serbo Croatian hbs
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology:
  • Templatic:
Slovak slk 14796 1046
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Slovenian slv 60110 2535
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Spanish spa 382955 5460
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Swahili swc
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Swedish swe 78411 10553
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Tajik tgk
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Tatar tat
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Telugu tel
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Tibetan bod
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Turkish tur 275460 3579
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Turkmen tuk
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Ukrainian ukr 20904 1493
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Urdu urd 12572 182
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Uzbek uzb
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Venetian vec
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Votic vot
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: true
Welsh cym 10641 183
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
West Frisian fry
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Yiddish yid
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Zulu zul
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false

Coming Attractions!

The following languages are in the process of being annotated according to the UniMorph specification.

Language ISO 639-3 Forms Paradigms Nouns Verbs Adjectives Source License
!Xóõ nmn
  • Typology: fusional
  • Templatic: false
  • Type: living
Afrikaans afr
  • Typology: fusional
  • Templatic: false
  • Type: living
Ancient Greek grc
  • Typology: fusional
  • Templatic: false
  • Type: historical
Aragonese arg
  • Typology: fusional
  • Templatic: false
  • Type: living
Aramaic arc
  • Typology: fusional
  • Templatic: true
  • Type: ancient
Buriat bua 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Chechen che 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Classical Nahuatl nci 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Corsican cos 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Egyptian Arabic arz 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Gagauz gag 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Hausa hau 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Hittite hit 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Inuktitut iku 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Istriot ist 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Japanese jpn 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Jèrriais nrf 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Kalaallisut kal 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Kirghiz kir 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Korean kor 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Ladino lad 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Limburgan lim 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Luxembourgish ltz 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Macedo-Romanian rup 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Malagasy mlg 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Malay msa 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Malayalam mal 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Mandarin Chinese cmn 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Middle Dutch dum 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Mirandese mwl 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Northern Tiwa twf 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Ojibwa oji 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Old Dutch odt 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Old Norse non 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Old Portuguese pto 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Old Provençal pro 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Panjabi pan 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Romansh roh 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Romany rom 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Sardinian srd 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Saterfriesisch stq 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Serbian srp 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Sicilian scn 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Skolt Sami sms 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Swiss German gsw 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Tswana tsn 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Uighur uig 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Võro vro 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Walloon wln 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Wymysorys wym 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no
Yucatec Maya yua 0 0
  • 2016 Shared Task Splits:no
  • 2017 Shared Task Splits:no