AI models for reading Polish cursive and printed texts

Understanding historical documents is key to understanding history. But understanding historical documents in Polish can be a challenge. Not only is the language often difficult to understand but the handwriting can, at times, be almost impossible to read. The fact that the Polish alphabet has changed much over the last few centuries makes this particularly tricky.

However, nowadays, AI can help. Using AI-powered Handwritten Text Recognition (HTR) platforms such as Transkribus, you can automatically transcribe both handwritten and printed documents in a fraction of the time it would take to transcribe them manually. In this post, we are going to take a closer look at how to transcribe historical documents with AI, and the models Transkribus offers for documents in Polish.

Letters are one of the many valuable sources that can transcribed with AI. © Jerzy Krzewicki via Wikimedia Commons

The benefit of digitising documents

There is something special about reading historical documents in their original form. The smell of the parchment, the careful page turns, and the feeling you get reading words that were written hundreds of years before is simply quite magical.

While a digital version can never compete in those terms, there are many advantages to creating digital versions of historical documents:

  • It makes it possible for everyone to read them. Not everyone has the skill set to read Polish handwriting from the 16th century, nor the time to learn how to. A digital version means that everyone can understand the document’s content without needing Polish handwriting help.
  • Digital versions can be easily shared online. Instead of going to the archive or museum to view the documents in person, interested persons can simply view them online from the comfort of their office or living room. This makes Polish language resources more accessible to more people, aiding collaboration and encouraging new perspectives on historical events.
  • Most importantly, digitising documents makes it easier to extract information. Let’s say you have a collection of birth records and you want to find all the records from a particular year. With paper documents, you would have to manually search through all the pages, scanning them for that year. With digital documents, you can simply type the year into the search bar and quickly find all the relevant documents. This is a smarter and more efficient way to work, saving time, effort, and money.

Digitising documents such as this diary makes them accessible for everyone. © Janina Turek’s Diary via Transkribus

How to automatically transcribe documents with AI

Before AI, transcribing historical documents was a time-consuming process. You needed someone with knowledge of the language and cursive script, who was prepared to spend sometimes months or years manually transcribing the documents.

AI platforms such as Transkribus have revolutionised this process. Now, all you have to do is upload an image of the document, select a model (see below), and the platform will give you a digital version of the text in the document. You can then edit this transcription if required, download it, share it, or even publish it online with your very own Transkribus Site.

While Transkribus was developed for documents in historical cursive handwriting, it can also be used with printed texts and offers several advantages over conventional OCR systems.

Transkribus is an effective alternative to traditional OCR systems. © ZG ZPwN via Wikimedia Commons

AI models for transcribing Polish documents

As mentioned above, when transcribing with Transkribus, you need to select a “model”. This is a little bit like a manual, telling the platform how to transcribe a certain type of document.

Each model is trained to read either handwritten or printed documents in a certain language and from a certain time period. They can be finely tuned to the handwriting of a single person, or cover a wide range of handwriting styles from across history.

Transkribus does allow you to train your own AI models, tailored to your specific documents. However, if you are new to the platform, we would recommend using a public model trained by the Transkribus community. This should give you a fairly accurate transcription, without the need to prepare training data or test a model.

The Text Titan I model is capable of reading documents that contain both handwritten and printed texts. © Government of Poland via Wikimedia Commons

There are two public models for Polish language documents:

Polish General Model

This all-purpose model was trained on a very wide variety of handwritten Polish documents, both historical and modern, making it a good go-to model for Polish handwritten documents of different types.

You can find out more here.

Text Titan I

The Text Titan I is one of our new transformer-powered “Super Models”. These advanced models are the masters of multi-tasking and are able to transcribe handwritten and printed documents in many different languages and scripts, all at the same time. It is therefore ideal for diverse collections, perhaps involving documents in Polish as well as other languages, and from different time periods.

The Text Titan I is only available to users with a Scholar, Team, or Organisation plan. You can find out more about the model here.

Try Transkribus now

Want to try Transkribus for yourself? Simply upload a Polish document to our demo version below and let the platform create an automatic transcription for you.

Thumbnail includes: Daukantas handwriting, 1857 -1859. © Simonas Daukantas via Wikimedia Commons

SHARE THIS ARTICLE

Recent Posts

July 3, 2024
News, Transkribus
Some Transkribus projects finish with a complete digitised collection in Transkribus. Some take that digitised source and use it to ...
June 12, 2024
News, Transkribus
When you think of Carolingian (or Caroline) minuscule, Charlemagne and his vast Carolingian empire likely come to mind. While the ...
May 14, 2024
Uncategorized
Understanding historical documents is key to understanding history. But understanding historical documents in Polish can be a challenge. Not only ...