3 AI Models For Transcribing German Text In Fraktur, Kurrent and Sütterlin

February 17, 2023
Handwritten Text Recognition

If you regularly work with German historical documents then there are three types of German script that you are probably very familiar with: Fraktur, Kurrent, and Sütterlin. These scripts were used from the 16th century right up until the Second World War, covering several centuries of German and Central European history. However, nowadays, they are almost impossible to read for the untrained eye, making transcribing these kinds of documents a long and time-consuming process.

Thankfully, technology can now speed things up. Platforms such as Transkribus use AI models to recognise Fraktur, Kurrent, Sütterlin, and other scripts, and automatically create a digital version of the text. These digital versions can then be easily searched for certain words or phrases and easily shared with colleagues and the general public.

If you are new to using Transkribus to read historical documents in German, this post will introduce you to these three key scripts and show you three AI models that are ideal for transcribing them.

What is Fraktur?

The Fraktur font was used widely in German print from the early 16th century until it was outlawed by the Nazi Party in 1941. A form of black letter typeface, Fraktur letters are angular, rather than curved, and so it is often known in German as “gebrochene Schrift“, or “broken script”. Fraktur typefaces also often contain ligatures, most of which have their roots in German cursive handwriting.

How is it different to Kurrent?

In contrast to Fraktur, the “Kurrentschrift“, as it is known in German, is a type of handwritten script. It was also developed in the early 16th century and was then used up until the beginning of the 20th century when it was replaced by the newly developed Sütterlin script (see below). Until then, it was the standard handwriting that was taught in schools throughout Germany.

And what is Sütterlin?

As mentioned, the Sütterlin script was another type of German handwriting and was the successor to the Kurrent script. At the beginning of the 20th century, the Prussian Ministry of Science, Art and Culture decided it was time to update Kurrent with a form of handwriting that was easier to read. In 1911, they commissioned designer Ludwig Sütterlin to create such a script, which he gladly did. The Sütterlin script was first introduced into Berlin schools in 1914 and soon spread to become the dominant handwritten script throughout Germany. You can find out more information on Ludwig Sütterlin’s Wikipedia page.

3 AI models for reading Fraktur, Kurrent and Sütterlin

Transkribus German Handwriting M1

If there is one model that is useful for documents written in Kurrent and/or Sütterlin, it is this one. Trained with a whopping 3,610,922 words from a very diverse range of handwritten manuscripts, Transkribus German Handwriting M1 is able to transcribe almost any handwritten document with relative accuracy and without extra training. In addition to the Kurrent and Sütterlin documents, the training data also included some German-language documents written in Latin script, making it ideal for manuscripts containing multiple types of handwriting. For such a diverse model, it has a low CER of just 4.7%.

→ Go to model

German Fraktur 19th-20th Centuries

This AI model focuses on a particular type of Fraktur text: documents written in the 19th and 20th centuries. Developed by the Austrian National Library and the Newseye project, the model is based on 442,121 words from a wide variety of historical newspapers and publications. It also has a CER of just 1%, outperforming most standard OCR engines with these types of documents. However, please note that the model was trained exclusively on German-language documents, making it less suitable for Swedish or Finnish Fraktur, for example.

→ Go to model

German Kurrent 17th-18th Centuries

This Transkribus Kurrent model is what we sometimes call a “super model”: it is based on 1,840,000 words from a diverse set of documents, including council minutes of the Pomeranian government of Stralsund, the assessor votes of the Wismar High Court and various private letter collections. It was developed by the University of Greifswald, has a CER of 5.5% and is suitable for transcribing all manner of Kurrent documents from the 17th and 18th centuries.

→ Go to model

How do I use a public AI model with Transkribus?

Transkribus’ transcriptions are based on AI models. Each model has been trained to read a specific type of handwritten or printed text in a certain language, and often a certain time period or genre too.

If you want to transcribe a document with Transkribus, you first need to upload a scan of the document and then you choose a model. There are currently 94 public models available, which are all completely free to use. Transkribus will take the information stored in that model and apply it to your document, creating an instant transcription.

But what if there isn’t a model that is suitable for the text in your documents? Then you also have the chance to train your own. To do this, you need a series of pre-transcribed documents, collectively known as “Ground Truth”. The more ground-truth data you use to train your model, the more information it will contain and the more accurate it will be when transcribing new documents. To save time, many people use a public model as the base for their custom model and then fine-tune it with further Ground Truth.

For more information about models and how to train them, check out our How-to Guides.

Upload a document and give Transkribus a try:

SHARE THIS ARTICLE

Mapping the concerts of Beethoven and Haydn: the “Concert Life in Vienna” project

Some Transkribus projects finish with a complete digitised collection in Transkribus. Some take that digitised source and use it to ...

June 12, 2024

News, Transkribus

What is Carolingian Minuscule?

When you think of Carolingian (or Caroline) minuscule, Charlemagne and his vast Carolingian empire likely come to mind. While the ...

May 14, 2024

Uncategorized

AI models for reading Polish cursive and printed texts

Understanding historical documents is key to understanding history. But understanding historical documents in Polish can be a challenge. Not only ...

Cookie	Description	Duration
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.	1 hour
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.	1 year

Cookie	Description	Duration
VISITOR_INFO1_LIVE	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.	5 months
IDE	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.	2 years

Cookie	Description	Duration
GPS	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location	30 minutes
tk_or	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	5 years
tk_r3d	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience	3 days
tk_lr	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	1 year
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.	2 years
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.	1 day
matomo	For statistical analysis, we use “Matomo” on this website. This is an open source tool for web analysis. Matomo does not transmit data to servers outside the control of the READ-COOP. Matomo is deactivated when you visit our website. Only if you actively consent will your usage behaviour be recorded anonymously.	1 year

Cookie	Description	Duration
YSC	This cookies is set by Youtube and is used to track the views of embedded videos.	1 year
_gat	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.	1 minute

3 AI Models For Transcribing German Text In Fraktur, Kurrent and Sütterlin

What is Fraktur?

How is it different to Kurrent?

And what is Sütterlin?

3 AI models for reading Fraktur, Kurrent and Sütterlin

Transkribus German Handwriting M1

German Fraktur 19th-20th Centuries

German Kurrent 17th-18th Centuries

How do I use a public AI model with Transkribus?

For more information about models and how to train them, check out our How-to Guides.

Upload a document and give Transkribus a try:

Recent Posts

Mapping the concerts of Beethoven and Haydn: the “Concert Life in Vienna” project

What is Carolingian Minuscule?

AI models for reading Polish cursive and printed texts

The COOP

Products & Services

Useful information

Helpful resources

Community