UniMorph
The Universal Morphology (UniMorph) project is a collaborative effort to improve how NLP handles complex morphology in the world’s languages. The goal of UniMorph is to annotate morphological data in a universal schema that allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and by a rendering of its inflectional form in terms of a bundle of morphological features from our schema. The specification of the schema is described here and in Sylak-Glassman (2016).
Plus, we’re now available in a Python package! pip install unimorph
UniMorph Events
- SIGMORPHON 2022 Shared Task
- SIGMORPHON 2021 Shared Task
- SIGMORPHON 2020 Shared Task
- SIGMORPHON 2019 Shared Task
- CoNLL–SIGMORPHON 2018 Shared Task
- CoNLL–SIGMORPHON 2017 Shared Task
- SIGMORPHON 2016 Shared Task
Annotated Languages
The following 169 languages have been annotated according to the UniMorph schema. Missing parts of speech will be filled in soon.
| Language | ISO 639-3 | Forms | Paradigms | Nouns | Verbs | Adjectives | Source | License | ||
|---|---|---|---|---|---|---|---|---|---|---|
| ↪Adyghe | ady | 20475 | 1666 | ✔ | ✔ |
|
|
|||
| ↪Afrikaans | afr | 309558 | 161749 | ✔ | ✔ | ✔ | ℒ |
|
||
| ↪Akan | aka | 4182 | 96 | ✔ | ℒ |
|
||||
| ↪Albanian | sqi | 33483 | 589 | ✔ | ✔ |
|
|
|||
| ↪Amharic | amh | 46224 | 2461 | ✔ | ✔ | ✔ | ℒ |
|
||
| ↪Ancient Greek | grc | 41593 | 2431 | ✔ | ✔ |
|
|
|||
| ↪Arabic | ara | 140003 | 4134 | ✔ | ✔ | ✔ |
|
|||
| ↪Armenian | hye | 338461 | 7033 | ✔ | ✔ | ✔ |
|
|
||
| ↪Ashaninka | cni | 10952 | 407 | ✔ | ✔ | ℒ |
|
|||
| ↪Assamese | asm | 94147 | 1877 | ✔ | ✔ | ℒ |
|
|||
| ↪Asturian | ast | 29797 | 436 | ✔ | ✔ | ✔ |
|
|
||
| ↪Aymara | aym | 336341 | 3410 | ✔ | ✔ | ℒ |
|
|||
| ↪Azerbaijani | aze | 8004 | 340 | ✔ | ✔ |
|
|
|||
| ↪Bashkir | bak | 12168 | 1084 | ✔ | ✔ |
|
|
|||
| ↪Basque | eus | 11889 | 26 | ✔ |
|
– | ||||
| ↪Belarusian | bel | 16113 | 1027 | ✔ | ✔ | ✔ |
|
|
||
| ↪Bengali | ben | 4443 | 136 | ✔ | ✔ |
|
|
|||
| ↪Braj | bra | 1821 | 1246 | ✔ | ✔ | ✔ | ℒ |
|
||
| ↪Breton | bre | 2294 | 44 | ✔ | ✔ |
|
|
|||
| ↪Bulgarian | bul | 55730 | 2468 | ✔ | ✔ | ✔ |
|
|
||
| ↪Catalan | cat | 81576 | 1547 | ✔ |
|
|
||||
| ↪Cebuano | ceb | 618 | 97 | ✔ | ℒ |
|
||||
| ↪Central Kurdish | ckb | 24317 | 274 | ✔ | ✔ | ✔ |
|
LGPLLR | ||
| ↪Chewa | nya | 4370 | 227 | ✔ | ℒ |
|
||||
| ↪Chichicapan Zapotec | zpv | 1164 | 379 | ✔ | Surrey |
|
||||
| ↪Chichimeca-Jonaz | pei | 15120 | 123 | ✔ | Surrey |
|
||||
| ↪Chukchi | ckt | 243 | 197 | ✔ | ✔ | ℒ |
|
|||
| ↪Classical Armenian | xcl | 97181 | 4300 | ✔ | ✔ | ✔ |
|
|
||
| ↪Classical Syriac | syc | 31972 | 3299 | ✔ | ✔ | ℒ |
|
|||
| ↪Cornish | cor | 469 | 9 | ✔ | ✔ |
|
|
|||
| ↪Crimean Tatar | crh | 7514 | 1230 | ✔ | ✔ | ✔ |
|
|
||
| ↪Czech | ces | 50284287 | 824074 | ✔ | ✔ | ✔ |
|
|
||
| ↪Dakota | dak | 3766 | 537 | ✔ | ℒ |
|
||||
| ↪Danish | dan | 25503 | 3193 | ✔ | ✔ |
|
|
|||
| ↪Dutch | nld | 55467 | 4993 | ✔ | ✔ |
|
|
|||
| ↪Eastern Highland Chatino | cly | 4716 | 185 | ✔ | Surrey |
|
||||
| ↪Egyptian Arabic | arz | 25394 | 6347 | ✔ | ✔ | ✔ |
|
|||
| ↪Eibela | ail | 2718 | 642 | ✔ | ✔ | ℒ |
|
|||
| ↪English | eng | 115523 | 22765 | ✔ |
|
|
||||
| ↪Estonian | est | 38215 | 886 | ✔ | ✔ |
|
|
|||
| ↪Evenki | evn | 11408 | 4499 | ✔ | ✔ | ✔ | ℒ |
|
||
| ↪Faroese | fao | 45474 | 3077 | ✔ | ✔ | ✔ |
|
|
||
| ↪Finnish | fin | 2490377 | 57642 | ✔ | ✔ | ✔ |
|
|
||
| ↪French | fra | 367732 | 7535 | ✔ |
|
|
||||
| ↪Friulian | fur | 8071 | 168 | ✔ |
|
|
||||
| ↪Galician | gal | 36801 | 486 | ✔ |
|
|
||||
| ↪Georgian | kat | 74412 | 3782 | ✔ | ✔ | ✔ |
|
|
||
| ↪German | deu | 179339 | 15060 | ✔ | ✔ |
|
|
|||
| ↪Gothic | got |
|
|
|||||||
| ↪Greenlandic | kal | 368 | 23 | ✔ |
|
|
||||
| ↪Gujarati | guj | 19404 | 6995 | ✔ | ✔ | ✔ |
|
|
||
| ↪Gulf Arabic | afb | 32236 | 6707 | ✔ | ✔ | ✔ |
|
|||
| ↪Gã | gaa | 909 | 95 | ✔ | ℒ |
|
||||
| ↪Haida | hai | 7040 | 41 | ✔ |
|
– | ||||
| ↪Hebrew | heb | 13818 | 510 | ✔ | ✔ |
|
|
|||
| ↪Hiligaynon | hil | 1256 | 97 | ✔ | ℒ |
|
||||
| ↪Hindi | hin | 54438 | 258 | ✔ |
|
|
||||
| ↪Hungarian | hun | 490394 | 13989 | ✔ | ✔ |
|
|
|||
| ↪Icelandic | isl | 76915 | 4775 | ✔ | ✔ |
|
|
|||
| ↪Indonesian | ind | 27714 | 3877 | ✔ | ✔ | ✔ | ℒ |
|
||
| ↪Ingrian | izh | 1099 | 50 | ✔ | ✔ |
|
|
|||
| ↪Irish | gle | 107298 | 7464 | ✔ | ✔ | ✔ |
|
|
||
| ↪Italian | ita | 509574 | 10009 | ✔ |
|
|
||||
| ↪Itelmen | itl | 2701 | 1636 | ✔ | ✔ | ℒ |
|
|||
| ↪Kabardian | kbd | 3092 | 250 | ✔ | ✔ |
|
|
|||
| ↪Kannada | kan | 6402 | 159 | ✔ | ✔ |
|
|
|||
| ↪Karelian | krl | 411277 | 10842 | ✔ | ✔ |
|
|
|||
| ↪Kashubian | csb | 509 | 37 | ✔ |
|
|
||||
| ↪Kazakh | kaz | 40283 | 1755 | ✔ | ✔ |
|
|
|||
| ↪Khakas | kjh | 1200 | 75 | ✔ |
|
|
||||
| ↪Khaling | klr | 156097 | 591 | ✔ |
|
LGPLLR | ||||
| ↪Kholosi | hsi | 174 | 48 | ✔ | ✔ | ℒ |
|
|||
| ↪Kodi | kod | 462 | 65 | ✔ | ✔ | ✔ | ℒ |
|
||
| ↪Kongo | kon | 828 | 200 | ✔ | ℒ |
|
||||
| ↪Kunwinjku | gup | 307 | 73 | ✔ | ℒ |
|
||||
| ↪Kyrgyz | kir | 5544 | 98 | ✔ | ℒ |
|
||||
| ↪Ladin | lld | 180 | 7656 | ✔ | ✔ |
|
|
|||
| ↪Latin | lat | 509182 | 17214 | ✔ | ✔ | ✔ |
|
|
||
| ↪Latvian | lav | 136998 | 7548 | ✔ | ✔ | ✔ |
|
|
||
| ↪Lingala | lin | 228 | 57 | ✔ | ℒ |
|
||||
| ↪Lithuanian | lit | 34130 | 1458 | ✔ | ✔ | ✔ |
|
|
||
| ↪Livonian | liv | 3987 | 203 | ✔ | ✔ | ✔ |
|
|
||
| ↪Livvi | olo | 1199149 | 27676 | VepKar |
|
|||||
| ↪Low German | nds |
|
|
|||||||
| ↪Lower Sorbian | dsb | 20121 | 994 | ✔ | ✔ | ✔ |
|
|
||
| ↪Ludian | lud | 11313 | 6751 | VepKar |
|
|||||
| ↪Luganda | lug | 4895 | 89 | ✔ | ℒ |
|
||||
| ↪Macedonian | mkd | 168057 | 10313 | ✔ | ✔ | ✔ |
|
|
||
| ↪Malagasy | mlg | 644 | 159 | ✔ | ℒ |
|
||||
| ↪Maltese | mlt | 3584 | 112 | ✔ | ✔ |
|
|
|||
| ↪Manx | glv | 14 | 1 |
|
|
|||||
| ↪Maori | mao | 214 | 104 | ✔ | ℒ |
|
||||
| ↪Mapudungun | arn | 783 | 26 | ✔ |
|
|
||||
| ↪Mezquital Otomi | ote | 33162 | 2028 | ✔ | Surrey |
|
||||
| ↪Middle French | frm | 36970 | 603 | ✔ |
|
|
||||
| ↪Middle High German | gmh | 708 | 29 | ✔ | ✔ |
|
|
|||
| ↪Middle Low German | gml | 1513 | 52 | ✔ | ✔ | ✔ |
|
|
||
| ↪Modern Greek | ell | 199763 | 11906 | ✔ | ✔ | ✔ |
|
|
||
| ↪Murrinhpatha | mwf | 1110 | 29 | ✔ |
|
|
||||
| ↪Navajo | nav | 12354 | 674 | ✔ | ✔ |
|
|
|||
| ↪Neapolitan | nap | 1808 | 40 | ✔ |
|
|
||||
| ↪Norman | xno | 280 | 5 | ✔ |
|
|
||||
| ↪North Frisian | frr | 3204 | 51 | ✔ |
|
|
||||
| ↪Northern Kurdish | kmr | 216370 | 15083 | ✔ | ✔ | ✔ |
|
LGPLLR | ||
| ↪Northern Sami | sme | 62677 | 2103 | ✔ | ✔ | ✔ |
|
|
||
| ↪Norwegian Bokmål | nob | 19238 | 5527 | ✔ | ✔ | ✔ |
|
|
||
| ↪Norwegian Nynorsk | nno | 15319 | 4689 | ✔ | ✔ | ✔ |
|
|
||
| ↪O'odham | ood | 1628 | 370 | ✔ | ✔ | ℒ |
|
|||
| ↪Occitan | oci | 8316 | 174 | ✔ |
|
|
||||
| ↪Old Church Slavonic | chu | 4148 | 152 | ✔ |
|
|
||||
| ↪Old English | ang | 42425 | 1867 | ✔ | ✔ | ✔ |
|
|
||
| ↪Old French | fro | 123374 | 1700 | ✔ | ✔ |
|
|
|||
| ↪Old High German | goh | 7248 | 482 | ✔ | ✔ |
|
|
|||
| ↪Old Irish | sga | 1089 | 49 | ✔ | ✔ |
|
|
|||
| ↪Old Saxon | osx | 22287 | 863 | ✔ | ✔ | ✔ |
|
|
||
| ↪Oromo | orm | 2046 | 92 | ✔ | ℒ |
|
||||
| ↪Pashto | pus | 6945 | 395 | ✔ | ✔ | ✔ |
|
|
||
| ↪Persian | fas | 37128 | 273 | ✔ |
|
|
||||
| ↪Plains Cree | cre | 9577 | 32 | ✔ | ℒ |
|
||||
| ↪Polish | pol | 13882543 | 274550 | ✔ | ✔ | ✔ | ℒ |
|
||
| ↪Pomak | poma | 6557759 | 233533 | ✔ | ✔ | ✔ | ℒ |
|
||
| ↪Portuguese | por | 303996 | 4001 | ✔ |
|
|
||||
| ↪Quechua | que | 180004 | 1006 | ✔ | ✔ | ✔ |
|
|
||
| ↪Romanian | ron | 80266 | 4405 | ✔ | ✔ | ✔ |
|
|
||
| ↪Russian | rus | 473481 | 28068 | ✔ | ✔ | ✔ |
|
|
||
| ↪Sakha | sah | 590765 | 5622 | ✔ | ✔ | ℒ |
|
|||
| ↪San Pedro Amuzgos Amuzgo | azg | 12204 | 332 | ✔ | Surrey |
|
||||
| ↪Sanskrit | san | 33847 | 917 | ✔ | ✔ |
|
|
|||
| ↪Scottish Gaelic | gla | 781 | 73 | ✔ | ✔ |
|
|
|||
| ↪Seneca | see | 5460 | 140 | ✔ | ℒ |
|
||||
| ↪Serbo Croatian | hbs | 840799 | 24419 | ✔ | ✔ | ✔ |
|
|
||
| ↪Shipibo-Konibo | shp | 14588 | 2111 | ✔ | ℒ |
|
||||
| ↪Shona | sna | 3030 | 86 | ✔ | ℒ |
|
||||
| ↪Sierra Otomi | otm | 31380 | 1909 | ✔ | Surrey |
|
||||
| ↪Slovak | slk | 28428612 | 366183 | ✔ | ✔ | ✔ | ℒ |
|
||
| ↪Slovenian | slv | 60110 | 2535 | ✔ | ✔ | ✔ |
|
|
||
| ↪Sotho | sot | 494 | 26 | ✔ | ℒ |
|
||||
| ↪Southern Kurdish | sdh | 189 | 1 | ✔ | ℒ |
|
||||
| ↪Spanish | spa | 382955 | 5460 | ✔ |
|
|
||||
| ↪Swahili | swc | 14130 | 185 | ✔ | ✔ | ✔ | ℒ |
|
||
| ↪Swedish | swe | 78411 | 10553 | ✔ | ✔ | ✔ |
|
|
||
| ↪Swiss German | gsw | 2067 | 145 | ✔ | ℒ |
|
||||
| ↪Tagalog | tgl | 2912 | 344 | ✔ | ℒ |
|
||||
| ↪Tajik | tgk | 77 | 75 | ✔ | ✔ |
|
|
|||
| ↪Tatar | tat | 7832 | 1283 | ✔ | ✔ | ✔ |
|
|
||
| ↪Telugu | tel | 1548 | 127 | ✔ | ✔ |
|
|
|||
| ↪Tibetan | bod | 1355 | 1355 | ✔ | ℒ |
|
||||
| ↪Tlatepuzco Chinantec | cpa | 7893 | 697 | ✔ | Surrey |
|
||||
| ↪Turkish | tur | 275460 | 3579 | ✔ | ✔ | ✔ |
|
|
||
| ↪Turkmen | tuk | 810 | 68 | ✔ |
|
|
||||
| ↪Tuvan | tyv | 586180 | 5032 | ✔ | ✔ | ℒ |
|
|||
| ↪Ukrainian | ukr | 20904 | 1493 | ✔ | ✔ | ✔ |
|
|
||
| ↪Urdu | urd | 12572 | 182 | ✔ | ✔ |
|
|
|||
| ↪Uyghur | uig | 8178 | 90 | ✔ | ℒ |
|
||||
| ↪Uzbek | uzb | 37291 | 443 | ✔ | ✔ | ℒ |
|
|||
| ↪Venetian | vec | 18227 | 368 | ✔ |
|
|
||||
| ↪Veps (Vepsian) | vep | 815676 | 18618 | VepKar |
|
|||||
| ↪Votic | vot | 1430 | 55 | ✔ |
|
|
||||
| ↪Võro | vro | 512 | 63 | ✔ | ℒ |
|
||||
| ↪Welsh | cym | 10641 | 183 | ✔ |
|
|
||||
| ↪West Frisian | fry | 1429 | 85 | ✔ |
|
|
||||
| ↪Xibe | sjo | 3054 | 1892 | ✔ | ✔ | ℒ |
|
|||
| ↪Yaitepec Chatino | ctp | 3796 | 223 | ✔ | Surrey |
|
||||
| ↪Yanesha | ame | 3767 | 327 | ✔ | ✔ | ℒ |
|
|||
| ↪Yiddish | yid | 7986 | 803 | ✔ | ✔ | ✔ |
|
|
||
| ↪Yoloxóchitl Mixtec | xty | 3057 | 594 | ✔ | Surrey |
|
||||
| ↪Zarma | dje | 84 | 27 | ✔ | ℒ |
|
||||
| ↪Zenzontepec Chatino | czn | 1567 | 386 | ✔ | Surrey |
|
||||
| ↪Zulu | zul | 49562 | 621 | ✔ | ✔ | ✔ |
|
|
||

