Dataset card
Description
FLORES+ dev set in Tuvan (Tyvan)
License
CC-BY-SA-4.0
Attribution
The translation was contributed by the team of Tyvan.ru.
@inproceedings{wmt24-tuvan,
title="Enhancing {Tuvan} Language Resources through the {FLORES} Dataset",
author="Ali Kuzhuget and Airana Mongush and Nachyn-Enkhedorzhu Oorzhak",
booktitle = "Proceedings of the Ninth Conference on Machine Translation",
month = nov,
year = "2024",
address = "Miami, USA",
publisher = "Association for Computational Linguistics"
}
Language codes
- ISO 639-3: tyv
- ISO 15924: Cyrl
- Glottocode: tuvi1240
Additional language information
The linguistic landscape of the Tuvan language is quite homogeneous, the differences between the dialects in the Republic of Tyva are insignificant. There are Tuvan speaking small communities in the People's Republic of China and Mongolia, they have many distinguished features.
Some additional resources in Tuvan are below:
- The first Tuvan-Russian AI-translator, bidirectional dictionaries: https://tyvan.ru/
- The project containing important data for Russian-Tuvan and vice versa translations: https://github.com/Agisight/TyvaData
- Russian-Tuvan parallel data set with 50k translations: https://tyvan.ru/machineLearning/rus_tyv_parallel_50k.tsv
- Tyvan.ru iOS app: https://itunes.apple.com/ru/app/tyv-rus/id1023486602?mt=8
- Тыва-орус сөстүк – Android tyv-rus dictionary app: https://play.google.com/store/apps/details?id=ru.tuvlin.tyv_rus_android
- Tuvan Wikipedia: https://tyv.wikipedia.org/
- The Russian community dedicated to helping people digitize and preserve their language electronically – Языки разные - код один
- The LANGO.TO project, developed by a team of enthusiasts in NLP, aims to preserve and promote the languages of minority peoples. https://lango.to/
Workflow
Data was translated from Russian by several translators, native speakers of the target language: Моңгуш Салим (telegram: @renegenone), Ооржак Людмила, Оңгай-оол Чодураа, Күжүгет Али (telegram: @Agilight). Many translations have been double-checked by Али.
Contacts
Feel free to join in and discuss any Tuvan (Tyvan) data and ideas.
Ali Kuzhuget, telegram: @Agilight. Git: https://github.com/Agisight/
Aira Mongush: https://airamongush.com/
David Dale, telegram: @cointegrated.