The contents of this card can be edited in the source repository.
Dataset card for Norwegian Bokmål (moderate variety)
Description
FLORES+ dev and devtest set in Norwegian Bokmål -- moderate variety.
License
CC-BY-SA-4.0
Attribution
@inproceedings{maehlum-etal-2025-improved,
title = "Improved {N}orwegian {B}okm{\r{a}}l Translations for {FLORES}",
author = "M{\ae}hlum, Petter and
N{\ae}ss Evensen, Anders and
Scherrer, Yves",
editor = "Haddow, Barry and
Kocmi, Tom and
Koehn, Philipp and
Monz, Christof",
booktitle = "Proceedings of the Tenth Conference on Machine Translation",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.wmt-1.86/",
pages = "1124--1132",
ISBN = "979-8-89176-341-8",
abstract = "FLORES+ is a collection of parallel datasets obtained by translation from originally English source texts. FLORES+ contains Norwegian translations for the two official written variants of Norwegian: Norwegian Bokm{\r{a}}l and Norwegian Nynorsk. However, the earliest Bokm{\r{a}}l version contained non-native-like mistakes, and even after a later revision, the dataset contained grammatical and lexical errors. This paper aims at correcting unambiguous mistakes, and thus creating a new version of the Bokm{\r{a}}l dataset. At the same time, we provide a translation into Radical Bokm{\r{a}}l, a sub-variety of Norwegian which is closer to Nynorsk in some aspects, while still being within the official norms for Bokm{\r{a}}l. We discuss existing errors and differences in the various translations and the corrections that we provide."
}
Language codes
- ISO 639-3: nob
- ISO 15924: Latn
- Glottocode: norw1259
Additional language information
Workflow
The data was corrected based on the most recent FLORES+ Bokmål translations. The corrections were done by two native Norwegian Bokmål writers with high proficiency in English, and with earlier professional experience in translation.
Additional guidelines
The corrections were done on the conservative side, which was the case in the original translations, to maintain a difference between this and the radical dataset.