UniMorph

The Universal Morphology (UniMorph) project is a collaborative effort to improve how NLP handles complex morphology in the world’s languages. The goal of UniMorph is to annotate morphological data in a universal schema that allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and by a rendering of its inflectional form in terms of a bundle of morphological features from our schema. The specification of the schema is described here and in Sylak-Glassman (2016).

UniMorph Events

Annotated Languages

The following 107 languages have been annotated according to the UniMorph schema. Missing parts of speech will be filled in soon.

Language ISO 639-3 Forms Paradigms Nouns Verbs Adjectives Source License
Adyghe ady
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Albanian sqi 33483 589
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Arabic ara 140003 4134
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: true
Armenian hye 338461 7033
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Asturian ast
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Azerbaijani aze
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Bashkir bak
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Basque eus 11889 26
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Belarusian bel
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Bengali ben 4443 136
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Breton bre
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Bulgarian bul 55730 2468
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Catalan cat 81576 1547
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Central Kurdish ckb 22990 274
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Classical Armenian xcl
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Classical Syriac syc
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: true
Cornish cor
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Crimean Tatar crh
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Czech ces 134527 5125
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Danish dan 25503 3193
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Dutch nld 55467 4993
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
English eng 115523 22765
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Estonian est 38215 886
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Faroese fao 45474 3077
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Finnish fin 2490377 57642
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
French fra 367732 7535
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Friulian fur
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Galician gal
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Georgian kat 74412 3782
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
German deu 179339 15060
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Gothic got
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Greenlandic kal
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology:
  • Templatic:
Haida hai 7040 41
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Hebrew heb 13818 510
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: true
Hindi hin 54438 258
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Hungarian hun 490394 13989
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Icelandic isl 76915 4775
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Ingrian izh
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Irish gle 107298 7464
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Irish ita 509574 10009
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Kabardian kbd
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Kannada kan
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Karelian krl
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Kashubian csb
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Kazakh kaz
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Khakas kjh
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Khaling klr 156097 591
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Ladin lld
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Latin lat 509182 17214
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Latvian lav 136998 7548
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Lithuanian lit 34130 1458
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Livonian liv
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Low German nds
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Lower Sorbian dsb 20121 994
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Macedonian mkd 168057 10313
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Maltese mlt
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Manx glv
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Mapudungun arn
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Middle French frm
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Middle High German gmh
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Middle Low German gml
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Modern Greek ell
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Murrinhpatha mwf
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology:
  • Templatic:
Navajo nav 12354 674
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Neapolitan nap
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Norman xno
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology:
  • Templatic:
North Frisian frr
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Northern Kurdish kmr 216370 15083
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Northern Sami sme 62677 2103
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Norwegian Bokmål nob 19238 5527
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Norwegian Nynorsk nno 15319 4689
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Occitan oci
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old Church Slavonic chu
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old English ang
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old French fro
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old Irish sga
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Old Saxon osx
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Pashto pus
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Persian fas 37128 273
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Polish pol 201024 10185
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Portuguese por 303996 4001
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Quechua que 180004 1006
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Romanian ron 80266 4405
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Russian rus 473481 28068
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Sanskrit san
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Scottish Gaelic gla 781 73
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Serbo Croatian hbs
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology:
  • Templatic:
Slovak slk 14796 1046
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Slovenian slv 60110 2535
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Spanish spa 382955 5460
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Swahili swc
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Swedish swe 78411 10553
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Tajik tgk
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Tatar tat
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Telugu tel
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Tibetan bod
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Turkish tur 275460 3579
  • 2016 Shared Task Splits: yes
  • 2017 Shared Task Splits: yes
  • Typology: agglutinative
  • Templatic: false
Turkmen tuk
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Ukrainian ukr 20904 1493
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Urdu urd 12572 182
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
Uzbek uzb
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false
Venetian vec
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Votic vot
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: true
Welsh cym 10641 183
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: yes
  • Typology: fusional
  • Templatic: false
West Frisian fry
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Yiddish yid
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: fusional
  • Templatic: false
Zulu zul
  • 2016 Shared Task Splits: no
  • 2017 Shared Task Splits: no
  • Typology: agglutinative
  • Templatic: false

Coming Attractions!

The following languages are in the process of being annotated according to the UniMorph specification.

Language ISO 639-3
!Xóõ nmn
  • Typology: fusional
  • Templatic: false
  • Info: wikipedia
  • Type: living
Afrikaans afr
  • Typology: fusional
  • Templatic: false
  • Info: wikipedia
  • Type: living
Ancient Greek grc
  • Typology: fusional
  • Templatic: false
  • Info: wikipedia
  • Type: historical
Aragonese arg
  • Typology: fusional
  • Templatic: false
  • Info: wikipedia
  • Type: living
Aramaic arc
  • Typology: fusional
  • Templatic: true
  • Info: wikipedia
  • Type: ancient
Buriat bua
Chechen che
Classical Nahuatl nci
Corsican cos
Egyptian Arabic arz
Gagauz gag
Hausa hau
Hittite hit
Inuktitut iku
Istriot ist
Japanese jpn
Jèrriais nrf
Kalaallisut kal
Kirghiz kir
Korean kor
Ladino lad
Limburgan lim
Luxembourgish ltz
Macedo-Romanian rup
Malagasy mlg
Malay msa
Malayalam mal
Mandarin Chinese cmn
Middle Dutch dum
Mirandese mwl
Northern Tiwa twf
Ojibwa oji
Old Dutch odt
Old Norse non
Old Portuguese pto
Old Provençal pro
Panjabi pan
Romansh roh
Romany rom
Sardinian srd
Saterfriesisch stq
Serbian srp
Sicilian scn
Skolt Sami sms
Swiss German gsw
Tswana tsn
Uighur uig
Võro vro
Walloon wln
Wymysorys wym
Yucatec Maya yua