The contents of this card can be edited in the source repository.
Dataset card
Description
FLORES+ dev set in Meadow Mari
License
CC-BY-SA-4.0
Attribution
Language codes
- ISO 639-3: mhr
- ISO 15924: Cyrl
- Glottocode: gras1239
Additional language information
Mari is a Uralic language spoken by about 451,000 people mainly in Mari El Republic in the west of the Russian Federation, and also in Bashkortostan, Tatarstan, Udmurtia, Perm and other parts of the Russian Federation. There are three standard forms of Mari: Hill Mari, Meadow Mari and Northwestern Mari, with significant differences between them. Eastern Mari is a fourth dialect, sharing the written standard with Meadow Mari, but different phonologically.
Some additional resources in Meadow Mari are below:
- A parallel corpus with Russian: https://huggingface.co/datasets/AigizK/mari-russian-parallel-corpora
- A monolingual corpus composed of various genres: https://huggingface.co/datasets/mari-lab/mari-monolingual-corpus
Workflow
Data was translated from Russian by one translator, a native speaker of the target language. 100% of the data was checked by one more independent translator.