How to compute accuracy of HTR models

How to compute accuracy of HTR models

Transkribus Tools
Transkribus Expert Client
Last update 1 year ago
About Transkribus

Transkribus is a comprehensive solution for the digitisation, AI-powered text recognition, transcription and searching of historical documents. Find out more about Transkribus here

Transkribus is a comprehensive solution for the digitisation, AI-powered text recognition, transcription and searching of historical documents. Find out more about Transkribus here

Table of Contents

Table of Contents

Introduction

This guide will show you how to compute the accuracy of different models and compare the accuracy of the automatic transcription in more detail as well as how to compare the accuracy of a recognition model run on a sample set of your specific material. 

Compute accuracy 

You can measure the accuracy of your model on specific pages of your Training and Validation Sets with the “Compute Accuracy” feature in the “Tools” tab. To do so, you first need to generate an HTR transcript. 

As “Reference”, choose a page version that has been transcribed correctly (Ground Truth: manual transcription as close to the original text as possible). To get the most significant value it would be best to use pages from a sample set which have not been used in the training and therefore are new to the model. Using pages from the Validation Set is also an option but not as ideal as the one just mentioned. Using pages from the Training Set is not a good idea because this will output CER-values that are lower than they actually are. 

As “Hypothesis”, choose the version that was automatically generated with an HTR-model and on which you would like to test how good the result is. 

You can change the versions to be compared by clicking on the grey button next to “Reference” and “Hypothesis”. Double-click to choose the desired version of the document in the appearing window. The versions that can be selected for “Reference” and “Hypothesis” are different versions of your document, which have been created after running a new job or saving transcriptions. 

Figure 1 “Compute Accuracy” within the tab “Tools”
Figure 2 Choosing the right version by double-clicking

Options to check the results of automated transcriptions

Compare text versions

If you click on “Compare Text Versions” you will get a visual representation of what the HTR model transcribed correctly and incorrectly.

Figure 3 Compare Text Versions

Please note that even if only one character is wrong, the whole word is marked in red. In green, the word is shown as it is written in the Ground Truth transcription. In the passages without colour the recognised text is identical with the Ground Truth. 

Compare

This accuracy check is the quickest version. To access it, click on “Compare…”.
Firstly, please make sure the right versions have been selected in the upper section of the appearing window. Then hit the “Compare” button. The result will be shown in the lower section of the window after a few seconds. 

Figure 4 Results 

The values are calculated for the page you have currently loaded in the background. In the example image, we have a CER of 2.34% on that page, which means that 97.66% of the characters in the automated transcript are correct. 

By double clicking the date and time within the “Created” column of the simple comparison tab, you will automatically arrive at the “Advanced Statistics” window. Here you will get more detailed indications and values and the results can be exported into an Excel file. 

Figure 5 Advanced Statistics 

The overview display shows two tables: one with the “Overall” value, which are the average values of the recognition on all pages in a document. In the table below you can find the values for the individual pages. This way you can compare the results on different pages and by double-clicking the line you will arrive at the text-comparison, where you can check which words or text passages have been challenging. 

Note: The weighting of pages for the “Overall” value is calculated based on the number of recognised words on a page. 

Advanced compare

When opening the “Compare” window you can choose another tab called “Advanced Compare”. 

Figure 6 Advanced Compare 

With “Advanced Compare”, you can check the accuracy for more pages at once by adding the pages you would like to evaluate (e.g. 1-6). By clicking on the button with the three points on the far right you can choose individual pages. 

After starting the accuracy check by clicking on “Compare”, the results will be shown in the table below and by double clicking the value in the “Created” column, you will arrive at the “Advanced Statistics” window again. 

Compare Samples

The “Compare Samples” functionality is useful if you are planning a bigger recognition project and would like to evaluate which model to choose before you run it on the whole document. This comparison feature chooses random lines from the sample document and tests the performance of the model on these lines. 

It makes sense to put some pages aside at the beginning in order to use them as sample documents. This is advantageous, as the material the model will be tested on has not been seen before and therefore the evaluation result will be more reliable. 

The “Compare Samples” functionality is also situated within the “Tools” tab in the “Compute Accuracy” section. To open it, click on “Compare Samples” and under “Create New Samples” fill out the required information. 

Figure 7 “Create New Samples” window

At “Nr. of lines for sample” you can define how many lines you would like to test. 500 are a recommended average. The more lines you put here the lower the variation will be in the result and the prognosis will be more precise. For a large project with many pages, it might be reasonable to say 1000 lines, for a very small attempt, maybe 100 lines are already enough. Here too, like with so many things, the best way to go about it is a “trial and error” approach, as it always depends on the individual goal.

With the “Baseline length threshold” you can control the length of lines, which is practical if you have a lot of short lines in your material, which often happens e.g. with tables in newspapers. So you can say, e.g. a line should be at least 20% of the line width – for handwritten material with only one column this step is probably not necessary. By clicking the “Keep line text” option, you can literally keep the text you already have in your documents and only need to correct the lines after creating the sample. 

From the list on the left side choose the collection and document of which the sample should consist of via the “Add to Sample Set button”. Then click on “Create sample”. Transkribus will now randomly choose the defined number of lines in the selected documents. 

The next step is to load the sample document (you can find it in your collection) and manually transcribe the line snippets (if you haven’t kept the text like described above). It will be only one line per page and therefore the transcribing in most cases will be quick. If you have finished one line jump to the next page of the sample document to proceed. 

When you are finished with transcribing, run the model which you would like to test on the sample document – to produce the transcription you can then compare with the “Compare Samples” feature. 

To do so, open the “Compute Sample Error” tab in the “Compare Samples” window and choose the document you would like to evaluate. Then click on “Compute” to start the job. As soon as “Completed” appears in the “Status” column you can double click the cell in the “Created” column to view the results. 

Figure 8 Generating results with the “Compare Samples” feature