Ever had trouble reading someone else’s handwriting?
Well, it may reassure you to know that it’s not only humans that have this problem, but computers too. While computers have been able to recognise and transcribe printed text for decades, recognising handwritten texts has only been possible for the last few of years. The technology that does this is known as handwritten text recognition or HTR, and it is the foundation of the Transkribus software.
Handwriting recognition is a fascinating technology but it can be a bit complex to understand at the beginning. So if you are new to this new and unique technology, here is a quick introduction to what HTR is and what it does.
What is handwriting recognition or HTR?
Handwriting recognition is a type of technology that can be used to “read” the writing in images of handwritten documents. Let’s say you wrote an essay by hand back when you were at school and you now want to have that essay as digital text on your computer. With the right HTR software, you could take a photo of the essay, run it through the software and get that same essay as a digital text file, which can be downloaded and shared as necessary. This is the basic principle of handwriting recognition.
We should also be clear on the terminology here. The kind of handwriting recognition described above is known as “offline handwriting recognition”. That is because it involves images of text that has already been written. There is also “online handwriting recognition”. This is software that generates digital text from handwriting as you write it, usually with a tablet and stylus. As Transkribus was created to recognise the handwriting in historical documents that have already been written, it can only be used for offline handwriting recognition.
Why do computers find handwritten text harder to recognise than printed text?
With printed texts, there is a finite number of fonts that can be used — 200,000 of them, in fact. And while this may seem like a lot, it does at least mean you can program software to read them all.
Handwriting is a different ball game. The 6.5 billion people on this planet who can write each have their own style of handwriting and no two styles are exactly the same. To make things more complicated, how someone writes on an official form might be very different to how they write in their own diary, for example. The infinite number of possibilities makes it also impossible to program software to read them all. Read more about the difference between optical character recognition (OCR) and handwriting recogntion in our insights blog.
How does handwriting recognition work?
So how does HTR technology recognise handwriting, if you can’t program it? Answer: HTR software doesn’t recognise handwriting, it learns to recognise handwriting. Over time, software like Transkribus uses AI and Deep Learning to learn how to read and transcribe different types of handwriting, just like a human would. To do this, you first need an AI model. This is like a giant digital mind that has learnt the shapes and characteristics of thousands of handwritten words, letters and even just symbols. Most importantly, the model can also make educated guesses about characters it hasn’t seen before. The software recognises the handwriting in the image and then uses the model’s knowledge to transcribe that handwriting into digital text. With the right model, you can achieve an almost perfect transcription at just the click of a button.
There are lots of public models already available for Transkribus, covering different languages and types of handwriting such as German Kurrent, or English handwriting from the 18th and 19th centuries. But you can also create your own model. This is done by manually transcribing a certain number of documents, to train the “digital mind” to recognise the specific handwriting in your documents. You can then use this custom model to automatically transcribe the rest of your documents.
What do people use HTR technology for?
There are many jobs that involve being able to quickly read handwritten documents. For example, historians and other researchers use historical handwritten documents to learn about the past. They often want to create digital versions of the documents, so that they can more easily analyse the documents and search the whole collection for certain words or terms. HTR technology is perfect for this.
Archives and libraries are also places where HTR technology can be useful. Nowadays, archivists and librarians aim to offer digital versions of as many books and documents as possible. But manually transcribing the hundreds and thousands of volumes in a standard archive would take forever. Automatic transcription with HTR is a much quicker alternative, and makes it easier to publish all the documents online too.
But there are many other people who use HTR technology to make their lives easier: companies wanting to digitise their old files, developers wanting to incorporate handwriting recognition into custom software, hobby genealogists who want to read their family’s old records, the list is endless!
Can I see an example of handwriting recognition in action?
If you want to see how handwriting recognition works, then simply upload a scanned image of handwritten text to the widget below, and see the technology in action.