Open Language Data Initiative

Papers

Since their first releases, the FLORES and Seed datasets have been well documented in research papers. Their community extensions (including those under WMT shared tasks in 2024 and 2025) often also resulted in publications. They are all listed below in reverse chronological order.

2025

2024

2023

2022 and earlier