The contents of this card can be edited in the source repository.
Dataset card
Description
FLORES+ in Hong Kong Cantonese
License
CC-BY-SA-4.0
Attribution
@article{nllb-22,
title = {No Language Left Behind: Scaling Human-Centered Machine Translation},
author = {{NLLB Team} and Costa-jussà, Marta R. and Cross, James and Çelebi, Onur and Elbayad, Maha and Heafield, Kenneth and Heffernan, Kevin and Kalbassi, Elahe and Lam, Janice and Licht, Daniel and Maillard, Jean and Sun, Anna and Wang, Skyler and Wenzek, Guillaume and Youngblood, Al and Akula, Bapi and Barrault, Loic and Mejia-Gonzalez, Gabriel and Hansanti, Prangthip and Hoffman, John and Jarrett, Semarley and Sadagopan, Kaushik Ram and Rowe, Dirk and Spruit, Shannon and Tran, Chau and Andrews, Pierre and Ayan, Necip Fazil and Bhosale, Shruti and Edunov, Sergey and Fan, Angela and Gao, Cynthia and Goswami, Vedanuj and Guzmán, Francisco and Koehn, Philipp and Mourachko, Alexandre and Ropers, Christophe and Saleem, Safiyyah and Schwenk, Holger and Wang, Jeff},
year = {2022},
eprint = {arXiv:1902.01382},
}
Language codes
- ISO 639-3: yue
- ISO 15924: Hant
- Glottocode: xian1255
Additional language information
Workflow
This data was released as part of the FLORES-200 dataset. Due to the domain and register, it was found to be mostly in Mandarin. Following feedback, it was retranslated to better match Hong Kong Cantonese. Please refer to the paper for further information.