+ Transkribus volunteer tackles Danish handwriting

There are now thousands of Transkribus users working with documents of all kinds of dates, languages and formats.  Today we would like to highlight some of the great work on the first Automated Text Recognition models for Danish handwriting.

Vagn Mørkeberg Christiansen is a retired volunteer at the Faxe Municipality Archives in Denmark.  The archives were interested in using Transkribus to open up a collection of early twentieth-century minutes for transcription and searching.  Vagn was invited to undertake this experiment.

Vagn used Transkribus to create training data for Automated Text Recognition by transcribing a few hundred pages from a collection of minutes from the parish of Braaby.  These minutes were written between 1912 and 1931 by J. P. Jensen and O. Christov, who were both chairmen of the local council.  Both individuals wrote relatively clearly, although the documents contain a few complications such as abbreviations and similarities between different characters.

Page of J. P. Jensen’s handwriting from 1913. Image courtesy of the Faxe Municipality Archives, Denmark.

At the latest count, Vagn has transcribed around 325 pages in Transkribus.  These pages were used to create three text recognition models for the two different hands in the collection.

The first model was trained on 17,500 words of Jensen’s writing and the results were promising.  Automated transcripts generated with this model reached an average Character Error Rate of 7.7%.

The next two models were trained on Christov’s writing, the first with around 16,000 words and the second with some 23,000 words.  Happily, there was a significant improvement in the results of automated transcription when more pages of training data were used.  The average Character Error Rate of the automated transcripts fell from 9.9% to 4.7%.

Page of O. Christov’s handwriting from 1922. Image courtesy of the Faxe Municipality Archives, Denmark.

These figures represent very good results for Automated Text Recognition.  Transcripts with these kinds of Character Error Rates can be easily read, searched and corrected.

The improvement in the model trained to recognise Christov’s handwriting is also an excellent demonstration of the big data approach behind Transkribus.  The more images and transcripts submitted to our platform as training data, the more accurate the recognition can become.

Vagn is enthusiastic about these results and plans to keep transcribing and training models.  His next target is to retrain the Christov model once again – this time with 40,000 transcribed words!

If you would like to train your own Automated Text Recognition model in Transkribus, take a look at the How to Guides on the Transkribus wiki.

We are also working on a beta version of Transkribus Web, a streamlined web version of Transkribus where volunteers like Vagn will be able to transcribe training material for text recognition more easily.

We would like to thank Vagn Mørkeberg Christiansen for providing the information for this news post.

SHARE THIS ARTICLE

Recent Posts

July 3, 2024
News, Transkribus
Some Transkribus projects finish with a complete digitised collection in Transkribus. Some take that digitised source and use it to ...
June 12, 2024
News, Transkribus
When you think of Carolingian (or Caroline) minuscule, Charlemagne and his vast Carolingian empire likely come to mind. While the ...
May 14, 2024
Uncategorized
Understanding historical documents is key to understanding history. But understanding historical documents in Polish can be a challenge. Not only ...