Open Language Data Initiative

The contents of this card can be edited in the source repository.

Dataset card

Description

FLORES+ dev set in Tuvan (Tyvan)

License

CC-BY-SA-4.0

Attribution

The translation was contributed by the team of Tyvan.ru.

@inproceedings{wmt24-tuvan,
    title="Enhancing {Tuvan} Language Resources through the {FLORES} Dataset",
    author="Ali Kuzhuget and Airana Mongush and Nachyn-Enkhedorzhu Oorzhak",
    booktitle = "Proceedings of the Ninth Conference on Machine Translation",
    month = nov,
    year = "2024",
    address = "Miami, USA",
    publisher = "Association for Computational Linguistics"
}

Language codes

Additional language information

The linguistic landscape of the Tuvan language is quite homogeneous, the differences between the dialects in the Republic of Tyva are insignificant. There are Tuvan speaking small communities in the People's Republic of China and Mongolia, they have many distinguished features.

Some additional resources in Tuvan are below:

Workflow

Data was translated from Russian by several translators, native speakers of the target language: Моңгуш Салим (telegram: @renegenone), Ооржак Людмила, Оңгай-оол Чодураа, Күжүгет Али (telegram: @Agilight). Many translations have been double-checked by Али.

Contacts

Feel free to join in and discuss any Tuvan (Tyvan) data and ideas.

Ali Kuzhuget, telegram: @Agilight. Git: https://github.com/Agisight/

Aira Mongush: https://airamongush.com/

David Dale, telegram: @cointegrated.