The contents of this card can be edited in the source repository.

Dataset card

Description

FLORES+ dev set in Tuvan (Tyvan)

License

CC-BY-SA-4.0

Attribution

The translation was contributed by the team of Tyvan.ru.

@inproceedings{wmt24-tuvan,
    title="Enhancing {Tuvan} Language Resources through the {FLORES} Dataset",
    author="Ali Kuzhuget and Airana Mongush and Nachyn-Enkhedorzhu Oorzhak",
    booktitle = "Proceedings of the Ninth Conference on Machine Translation",
    month = nov,
    year = "2024",
    address = "Miami, USA",
    publisher = "Association for Computational Linguistics"
}

Language codes

ISO 639-3: tyv
ISO 15924: Cyrl
Glottocode: tuvi1240

Additional language information

The linguistic landscape of the Tuvan language is quite homogeneous, the differences between the dialects in the Republic of Tyva are insignificant. There are Tuvan speaking small communities in the People's Republic of China and Mongolia, they have many distinguished features.

Some additional resources in Tuvan are below:

The first Tuvan-Russian AI-translator, bidirectional dictionaries: https://tyvan.ru/
The project containing important data for Russian-Tuvan and vice versa translations: https://github.com/Agisight/TyvaData
Russian-Tuvan parallel data set with 50k translations: https://tyvan.ru/machineLearning/rus_tyv_parallel_50k.tsv
Tyvan.ru iOS app: https://itunes.apple.com/ru/app/tyv-rus/id1023486602?mt=8
Тыва-орус сөстүк – Android tyv-rus dictionary app: https://play.google.com/store/apps/details?id=ru.tuvlin.tyv_rus_android
Tuvan Wikipedia: https://tyv.wikipedia.org/
The Russian community dedicated to helping people digitize and preserve their language electronically – Языки разные - код один
The LANGO.TO project, developed by a team of enthusiasts in NLP, aims to preserve and promote the languages of minority peoples. https://lango.to/

Workflow

Data was translated from Russian by several translators, native speakers of the target language: Моңгуш Салим (telegram: @renegenone), Ооржак Людмила, Оңгай-оол Чодураа, Күжүгет Али (telegram: @Agilight). Many translations have been double-checked by Али.

Contacts

Feel free to join in and discuss any Tuvan (Tyvan) data and ideas.

Ali Kuzhuget, telegram: @Agilight. Git: https://github.com/Agisight/

Aira Mongush: https://airamongush.com/

David Dale, telegram: @cointegrated.