Open Language Data Initiative

Related projects

Currently, OLDI manages OLDI-Seed (an extension of the NLLB-Seed dataset) and FLORES+ (an extension of the FLORES-200 dataset). Below, we list some other extensions and derivatives of these datasets, not managed by OLDI but still potentially relevant, as well as some other interesting multiway parallel datasets.

Partial FLORES translations

There are at least two translations of FLORES that comprise less than one full dataset split. By representing languages for which no other translation benchmarks exists, they could still be interesting:

Other FLORES derivatives

Other massively parallel datasets