Success Story
Published: 3 years ago

Improved Text Recognition for Finnish Historical Newspapers with Transkribus

The National Library of Finland has reprocessed almost two million historical newspaper pages with the Transkribus automatic text recognition workflow in cooperation with READ-COOP. The greatly improved recognition results convinced the Library of a workflow which was developed to its current state in the NewsEye project. The University of Innsbruck led this development. Text recognition in general, and high-accuracy recognition in particular, is of immense importance for the quality and usability of digitized historical sources.

The material in this reprocessing cooperation project with READ-COOP included a little under two million pages of Finnish newspapers dating from 1771 to 1914. The languages of the materials are Finnish and Swedish, according to the languages used in Finland during this timeframe. Now all the Finnish newspapers published in Finland starting from the first newspaper published in 1771 until the newspaper titles from 1914 and a selection of newspapers from 1915 to 1918 have been reprocessed.

The newly reprocessed newspapers will gradually replace the older versions, with lower optical character recognition results, in the publication and presentation system of the National Library of Finland, starting from summer 2021. The Library will launch an information campaign regarding the quality improvements. We are also aiming to process more newspapers from 1914 onwards, but this decision will follow later.

The improvement of the text recognition results has been considerable and we are currently calculating the exact figures. These will be published on https://digi.nationallibrary.fi .

The work in this cooperation has been financed by the EU’s European Regional Development Fund / Leverage for the 2014-2020 funding period.

Overview