Open Language Data Initiative

The contents of this card can be edited in the source repository.

Dataset card

Description

Seed data in Ligurian

License

CC-BY-SA-4.0

Attribution

@inproceedings{seed-23,
    title = {Small Data, Big Impact: Leveraging Minimal Data for Effective Machine Translation},
    author = {Maillard, Jean and Gao, Cynthia and Kalbassi, Elahe and Sadagopan, Kaushik Ram and Goswami, Vedanuj and Koehn, Philipp and Fan, Angela and Guzmán, Francisco},
    booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
    year = {2023},
    address = {Toronto, Canada},
    publisher = {Association for Computational Linguistics},
    pages = {2740--2756},
    url = {https://aclanthology.org/2023.acl-long.154},
}

Language codes

Additional language information

The data is in the Genoese dialect, using traditional spelling as codified in the following reference dictionaries:

Reference grammar:

Workflow

This data was released as part of the NLLB-Seed dataset. It has undergone minor spelling and syntactic fixes following community feedback and additional quality assessment. Please refer to the paper for further information.

Additional guidelines