+ Transkribus volunteer tackles Danish handwriting

March 12, 2018
HTR models, Success stories, Transkribus

There are now thousands of Transkribus users working with documents of all kinds of dates, languages and formats. Today we would like to highlight some of the great work on the first Automated Text Recognition models for Danish handwriting.

Vagn Mørkeberg Christiansen is a retired volunteer at the Faxe Municipality Archives in Denmark. The archives were interested in using Transkribus to open up a collection of early twentieth-century minutes for transcription and searching. Vagn was invited to undertake this experiment.

Vagn used Transkribus to create training data for Automated Text Recognition by transcribing a few hundred pages from a collection of minutes from the parish of Braaby. These minutes were written between 1912 and 1931 by J. P. Jensen and O. Christov, who were both chairmen of the local council. Both individuals wrote relatively clearly, although the documents contain a few complications such as abbreviations and similarities between different characters.

Page of J. P. Jensen’s handwriting from 1913. Image courtesy of the Faxe Municipality Archives, Denmark.

At the latest count, Vagn has transcribed around 325 pages in Transkribus. These pages were used to create three text recognition models for the two different hands in the collection.

The first model was trained on 17,500 words of Jensen’s writing and the results were promising. Automated transcripts generated with this model reached an average Character Error Rate of 7.7%.

The next two models were trained on Christov’s writing, the first with around 16,000 words and the second with some 23,000 words. Happily, there was a significant improvement in the results of automated transcription when more pages of training data were used. The average Character Error Rate of the automated transcripts fell from 9.9% to 4.7%.

Page of O. Christov’s handwriting from 1922. Image courtesy of the Faxe Municipality Archives, Denmark.

These figures represent very good results for Automated Text Recognition. Transcripts with these kinds of Character Error Rates can be easily read, searched and corrected.

The improvement in the model trained to recognise Christov’s handwriting is also an excellent demonstration of the big data approach behind Transkribus. The more images and transcripts submitted to our platform as training data, the more accurate the recognition can become.

Vagn is enthusiastic about these results and plans to keep transcribing and training models. His next target is to retrain the Christov model once again – this time with 40,000 transcribed words!

If you would like to train your own Automated Text Recognition model in Transkribus, take a look at the How to Guides on the Transkribus wiki.

We are also working on a beta version of Transkribus Web, a streamlined web version of Transkribus where volunteers like Vagn will be able to transcribe training material for text recognition more easily.

We would like to thank Vagn Mørkeberg Christiansen for providing the information for this news post.

SHARE THIS ARTICLE

Mapping the concerts of Beethoven and Haydn: the “Concert Life in Vienna” project

Some Transkribus projects finish with a complete digitised collection in Transkribus. Some take that digitised source and use it to ...

June 12, 2024

News, Transkribus

What is Carolingian Minuscule?

When you think of Carolingian (or Caroline) minuscule, Charlemagne and his vast Carolingian empire likely come to mind. While the ...

May 14, 2024

Uncategorized

AI models for reading Polish cursive and printed texts

Understanding historical documents is key to understanding history. But understanding historical documents in Polish can be a challenge. Not only ...

Cookie	Description	Duration
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.	1 hour
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.	1 year

Cookie	Description	Duration
VISITOR_INFO1_LIVE	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.	5 months
IDE	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.	2 years

Cookie	Description	Duration
GPS	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location	30 minutes
tk_or	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	5 years
tk_r3d	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience	3 days
tk_lr	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	1 year
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.	2 years
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.	1 day
matomo	For statistical analysis, we use “Matomo” on this website. This is an open source tool for web analysis. Matomo does not transmit data to servers outside the control of the READ-COOP. Matomo is deactivated when you visit our website. Only if you actively consent will your usage behaviour be recorded anonymously.	1 year

Cookie	Description	Duration
YSC	This cookies is set by Youtube and is used to track the views of embedded videos.	1 year
_gat	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.	1 minute

+ Transkribus volunteer tackles Danish handwriting

Recent Posts

Mapping the concerts of Beethoven and Haydn: the “Concert Life in Vienna” project

What is Carolingian Minuscule?

AI models for reading Polish cursive and printed texts

The COOP

Products & Services

Useful information

Helpful resources

Community