Open Language Data Initiative


The contents of this card can be edited in the source repository.

Dataset card for Norwegian Bokmål (moderate variety)

Description

FLORES+ dev and devtest set in Norwegian Bokmål -- moderate variety.

License

CC-BY-SA-4.0

Attribution

@inproceedings{maehlum-etal-2025-improved,
    title = "Improved {N}orwegian {B}okm{\r{a}}l Translations for {FLORES}",
    author = "M{\ae}hlum, Petter  and
      N{\ae}ss Evensen, Anders  and
      Scherrer, Yves",
    editor = "Haddow, Barry  and
      Kocmi, Tom  and
      Koehn, Philipp  and
      Monz, Christof",
    booktitle = "Proceedings of the Tenth Conference on Machine Translation",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.wmt-1.86/",
    pages = "1124--1132",
    ISBN = "979-8-89176-341-8",
    abstract = "FLORES+ is a collection of parallel datasets obtained by translation from originally English source texts. FLORES+ contains Norwegian translations for the two official written variants of Norwegian: Norwegian Bokm{\r{a}}l and Norwegian Nynorsk. However, the earliest Bokm{\r{a}}l version contained non-native-like mistakes, and even after a later revision, the dataset contained grammatical and lexical errors. This paper aims at correcting unambiguous mistakes, and thus creating a new version of the Bokm{\r{a}}l dataset. At the same time, we provide a translation into Radical Bokm{\r{a}}l, a sub-variety of Norwegian which is closer to Nynorsk in some aspects, while still being within the official norms for Bokm{\r{a}}l. We discuss existing errors and differences in the various translations and the corrections that we provide."
}

Language codes

Additional language information

Workflow

The data was corrected based on the most recent FLORES+ Bokmål translations. The corrections were done by two native Norwegian Bokmål writers with high proficiency in English, and with earlier professional experience in translation.

Additional guidelines

The corrections were done on the conservative side, which was the case in the original translations, to maintain a difference between this and the radical dataset.