Open Language Data Initiative


The contents of this card can be edited in the source repository.

Dataset card for Norwegian Bokmål (radical variety)

Description

FLORES+ dev and devtest set in Norwegian Bokmål -- radical variety.

License

CC-BY-SA-4.0

Attribution

@inproceedings{maehlum-etal-2025-improved,
    title = "Improved {N}orwegian {B}okm{\r{a}}l Translations for {FLORES}",
    author = "M{\ae}hlum, Petter  and
      N{\ae}ss Evensen, Anders  and
      Scherrer, Yves",
    editor = "Haddow, Barry  and
      Kocmi, Tom  and
      Koehn, Philipp  and
      Monz, Christof",
    booktitle = "Proceedings of the Tenth Conference on Machine Translation",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.wmt-1.86/",
    pages = "1124--1132",
    ISBN = "979-8-89176-341-8",
    abstract = "FLORES+ is a collection of parallel datasets obtained by translation from originally English source texts. FLORES+ contains Norwegian translations for the two official written variants of Norwegian: Norwegian Bokm{\r{a}}l and Norwegian Nynorsk. However, the earliest Bokm{\r{a}}l version contained non-native-like mistakes, and even after a later revision, the dataset contained grammatical and lexical errors. This paper aims at correcting unambiguous mistakes, and thus creating a new version of the Bokm{\r{a}}l dataset. At the same time, we provide a translation into Radical Bokm{\r{a}}l, a sub-variety of Norwegian which is closer to Nynorsk in some aspects, while still being within the official norms for Bokm{\r{a}}l. We discuss existing errors and differences in the various translations and the corrections that we provide."
}

Language codes

Additional language information

The radical version is based on the official Bokmål dictionary: https://ordbokene.no/nob/ And guided by the Association of Radical Bokmål: https://bokmal.no/

Workflow

The data was corrected based on the most recent FLORES+ Bokmål translations. The corrections were done by two native Norwegian Bokmål writers with high proficiency in English, and with earlier experience in translation.

Additional guidelines