GitHub
All of our datasets and source are available for collaboration and use in our GitHub repositories.
You can find:
- A Python package for downloading and searching morphological paradigms
- Pre-trained morphological analyzers in several languages; for pre-trained models in 1200 languages see Nicolai and Yarowsky (2019).
- A tool for reannotating Universal Dependencies corpora into the UniMorph morphosyntactic schema, optimized for harmony between UD and UniMorph and hand-engineered for a number of languages. If you use it, please cite McCarthy et al. (2018).
Dataset creation tools
The majority of our data is extracted from Wiktionary. We provide tools for such extraction here. Revisions and pull requests are welcome.
Additional tools
The following software artifacts have been released for use on data annotated according to the UniMorph schema. Please let us know if you would like your software listed on this part of the website.