Open Language Data Initiative

The contents of this card can be edited in the source repository.

Dataset card

Description

Seed data in Standard Moroccan Tamazight

License

CC-BY-SA-4.0

Attribution

@inproceedings{seed-23,
    title = {Small Data, Big Impact: Leveraging Minimal Data for Effective Machine Translation},
    author = {Maillard, Jean and Gao, Cynthia and Kalbassi, Elahe and Sadagopan, Kaushik Ram and Goswami, Vedanuj and Koehn, Philipp and Fan, Angela and Guzmán, Francisco},
    booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
    year = {2023},
    address = {Toronto, Canada},
    publisher = {Association for Computational Linguistics},
    pages = {2740--2756},
    url = {https://aclanthology.org/2023.acl-long.154},
}

Language codes

Additional language information

Reference dictionary: IRCAM’s Dictionnaire Général de la Langue Amazighe Informatisé.

Workflow

This data was released as part of the NLLB-Seed dataset, where it was incorrectly labeled tzm_Tfng. It was relabeled as zgh_Tfng after community feedback and additional quality assessment. Please refer to the paper for further information.

Additional guidelines