How to read French handwriting with AI

You can learn plenty about French history from reading books and watching documentaries. These kinds of sources are great for getting an overview of a topic. But it is primary sources such as birth registers, medieval manuscripts or personal letters that really get to the heart of a topic, giving us an unfiltered perspective on history and allowing us to draw our own conclusions about the events that took place.

However, handwritten documents like these are not always easy to read. Old handwriting can be notoriously tricky to decipher in any language, and French is no exception. In addition, many different scripts and writing styles have been used with the French language throughout history, from the medieval Carolingian script to modern-day cursive writing. This means you not only have to understand the language, but the script too.

Primary sources, such as letters, are key for unlocking the past. Image generated by AI

In the past, extensive skills and knowledge were required to read such documents. Thankfully, nowadays, AI-powered handwritten text recognition technology makes it possible to read and transcribe handwritten documents in French and many other languages, without being an expert in historical French handwriting. In this post, we are going to take a look at what makes French cursive writing so hard to read and show how artificial intelligence platforms like Transkribus can be used to overcome these challenges.

A short history of French handwriting

One of the main obstacles in understanding old French handwriting is the number of scripts that have been used by French scribes throughout history. How French was written in the 15th century is very different from how it is written today. Your documents could be written in any number of scripts, for example:

Carolingian script

During the medieval period, handwriting in France was heavily influenced by the Carolingian script, developed under the rule of Charlemagne. This script was characterised by clear, legible letters with some ornamentation.

Italic script

During the Renaissance, French handwriting underwent changes influenced by the humanist movement. Humanist scholars advocated for a return to classical forms, which led to the development of new styles such as the Italic script, with its inclined, flowing strokes.

A typical example of 19th-century French handwriting. Image from “Bulliot, Bibracte et moi” project, via Transkribus

Secretary hand

In the 17th century, the French Secretary hand, a style of handwriting used for official documents and correspondence, became popular. This script was characterised by its legibility and formality and evolved into various forms over the centuries as new writing instruments were developed.

Cursive script / “Écriture cursive”

In the 19th century and early 20th century, the French developed a type of cursive handwriting specifically for educational purposes. This script, known as “écriture cursive,” emphasised fluidity and connectivity between letters. It became the standard handwriting taught in French schools and is the most common type of French handwriting today.

Reading handwritten documents in French without technology

Before the development of assistive technology such as handwritten text recognition, reading handwritten documents in any language was a challenge. As explained in the introduction, you not only needed to know the language but also the script the document was written in.

Of course, it is possible to learn how to read different scripts. You would need to start small, learning what a few letters looked like in the script. From there, you could start to decipher whole words, particularly common or expected words, such as “Cordialement” in a letter, or “Date de naissance” in a birth register.

Public registries are a mine of historical information. Image from Batz-sur-Mer Registre d’état civil, via Wikimedia Commons

The final step would be to decipher whole sentences and, subsequently, the entire content of the document. Having strong skills in French was important for this: if you understand 90% of the words in a sentence, the other 10% can often be guessed based on context.

How handwritten text recognition makes it easier to read documents

Machines have been able to read printed text for the past few decades, thanks to optical character recognition (OCR) technology. However, due to the infinite types of possible handwriting, these OCR systems were of little use with handwritten text.

Around 10 years ago, a group of researchers, archivists and historians came together to develop a new technology for handwriting recognition, which could be used for the digitisation and transcription of handwritten documents. Being able to automatically transcribe large amounts of text allows researchers to extract data from sources much more quickly than with manual transcription, making research more efficient.

The result of this project was a technology called handwritten text recognition, or HTR. HTR platforms like Transkribus use artificial intelligence, machine learning and neural networks to literally learn how to read handwritten texts, just like a human would.

How to train an AI model in Transkribus

The platform does this by using AI models. Each model is a bit like a manual, telling Transkribus how to read a certain type of handwriting. For example, if you wanted to transcribe a collection of 19th-century handwritten texts in French, you would upload images of all the pages, and then tell the platform to transcribe them using the French Handwriting 19th century model. Transkribus would use the knowledge within that model to read the text in the images and create a digital transcription.

Transkribus uses AI to automatically transcribe handwritten text. Image from “Bulliot, Bibracte et moi” project, via Transkribus

But what is really unique about Transkribus is that it allows you to create your own handwriting recognition model and train the platform to read the specific handwriting in your documents. To do this, you need to upload a certain amount of “Ground Truth” training data — documents that have been pre-transcribed with 100% accuracy. The platform uses the information in this data to create a new “manual”, or model, which can then be used to transcribe the rest of your documents. Although it can take a bit of time to create a custom model from scratch, in the long run, it is almost always quicker than transcribing all your documents manually.

You can find out more about training AI models in our Help Center.

How accurate is handwriting recognition?

Accuracy is still one of the challenges in handwriting recognition. Human handwriting is extremely complex for machines to understand, and there is not yet a system that can transcribe documents without making any errors at all.

But some models come close. Each model is given a “character error rate” or CER. This shows what percentage of the characters in a text will likely be transcribed incorrectly. If the model has a CER of 100%, it will transcribe all the characters incorrectly. If it has a CER of 0%, then it will produce a perfect, error-free transcription.

As a general rule, models with a CER of 10% or less will normally produce a transcription of sufficient quality for analysis or further research, with minimal post-editing required.

The CERs of the latest Transkribus models can be seen in the final column. Image via Transkribus

Which AI models are available for French handwriting?

There are several “public” AI models for French handwriting on Transkribus, which are available for all users.

French General Model

Suitable for a wide range of documents, this all-purpose model was trained on various different hands from various different eras and is capable of reading both historical and modern handwriting. It has a CER of 7.8%.

You can try the model on this page.

French Handwriting 19th century

Known officially as “BBM Bulliot French C19th handwritten 2021”, this model was trained as part of the “Bulliot, Bibracte et moi” citizen science project. Its dataset consisted of approximately 147 000 words and it has a CER of 8.2%. This model is useful for other handwritten documents in French from the same time period.

You can try the model on this page.

The Text Titan I

This transformer-powered AI model is our go-to model for both handwritten and printed material in a range of languages, including French. It is therefore ideal for collections with many different types of material and script.

You can try the model by signing into your account at app.transkribus.org.

Medieval Scripts M2.4

This large model was trained on a wide variety of data from the medieval period and can be used not just for French but also for Dutch, German, Latin and Flemish texts. It has a CER of 7.1%.

You can try the model on this page.

French handwriting is transcribed using the “French Handwriting 19th century” model. Image from “Bulliot, Bibracte et moi” project, via Transkribus

How can I try Transkribus for myself?

Want to find out if Transkribus would work with your documents?

  • Go to app.transkribus.org and create an account.
  • Upload images of your documents.
  • Select a public model, such as the ones described above.
  • Let Transkribus create an automatic transcription.

Alternatively, you can test out Transkribus right now using Transkribus AI.

Thumbnail created with AI

SHARE THIS ARTICLE

Recent Posts

April 25, 2024
News, Transkribus
Back in January, we announced our new subscription plans: Individual, Scholar, and Organisation. Each plan is tailored to a particular ...
April 17, 2024
News, Transkribus
One of the biggest advantages of Transkribus is the possibility to train custom handwritten text recognition models. This unique feature ...
April 4, 2024
News
Spring has sprung and so has the April 2024 release of Transkribus. Here is a quick overview of all the ...