Last update of this guide: 29/10/2020
This guideline explains how to use the PyLaia training feature to train a model to recognise printed or handwritten text in your documents. After the training the model will help you to automatically transcribe and search your collection. The workflow for the model training with PyLaia is basically the same as with HTR+. Therefore, this guideline focuses on the parameters which can be set at the PyLaia-training. If you have more general questions on the model training and how it is done in Transkribus, you can find more information here: How to Train and Apply Handwritten Text Recognition Models in Transkribus.
Download the Transkribus Expert Client, or make sure you are using the latest version:
Transkribus and the technology behind it are made available via the following projects and sites:
Transkribus guidelines on other topics can be found here:
The Transkribus Platform is provided by the European Cooperative READ-COOP SCE. Transkribus was developed as part of the Horizon 2020 project READ under grant agreement No. 674943.
The READ project received the Horizon Impact Award 2020 for being a project with one of the highest impact.
Table of Contents
Figure 1: “Text Recognition”-section within the “Tools”-tab to access the PyLaia -training
Figure 2: Train-interface
The parameters for PyLaia can be found by opening the “Train”-window and then the “PyLaia”-tab.
Figure 3 PyLaia parameters
The epochs follow the same logic as for the HTR+. For a start it makes sense to stick to the default setting of 250. Please be aware that a too high number of epochs will slow down the training.
The value of 20 means, that if the CER of the Validation Set doesn’t go down within 20 epochs, the training will be stopped.
NOTE: important here and for trainings in general: the Validation Set needs to be variable and should possibly contain all types of elements of the documents included in the training set. If there is no or little variation in the Validation Set, the model may stop too early. Therefore if your validation set is rather small, please increase the “Early Stopping”-value in order to avoid the training from stopping before it has seen all the training data. Conclusion of this: don’t safe effort at the Validation Set.
It is possible to add a base model to your training. If you choose this option, the neural nets will learn quicker and you will save time. To have a benefit the base model needs to be similar to the writing it should recognise. With the help of a base model it is possible to speed up the training process. Likely you will also improve the quality of your recognition results with a base model. However this is not always guaranteed and has to be tested for the specific case.
One big benefit of working with base models is, that they make it possible to start with a smaller amount of training pages, which means that the transcription workload is reduced.
To use a base model, you simply need to choose the desired one with the “Choose…” button next to “Base Model:”.
Figure 4: Adding a base model
The “Learning Rate” defines the increment from one epoch to another, so how fast the training will proceed. With a higher value, the CER will go down faster. BUT: the higher the value, the higher the risk, that details are overlooked.
This value is adaptive and will be adjusted automatically. The training is influenced though by the value it is started with. You can go with the default setting here.
We have had some cases, where the pre-processing took too much time. If this happens to you, you can switch the “Image Type” to “Compressed”.
You can proceed in the following way: start the training with “Original”. Every now and then check the progress of the pre-processing with the “Jobs”-button. In case it will get stuck, you can cancel the job and restart it with the “Compressed”-setting.
You can open the advanced parameters for PyLaia by clicking on the “Advanced parameters”-button at the bottom of the standard PyLaia-paramters within the “PyLaia”-tab.
Figure 5 and 6: Advanced parameters
Deslant: choose this option with cursive writing in order to straighten it. Leave out this option with printed documents, because if printed documents contain cursive passages in addition to the normal print characters, the effect can be upside down.
Deslope: allows more variation at the baselines, e.g. more tolerance at baselines, that are not exactly horizontally but slanting.
Stretch: this option is for narrow writing in order to uncompress it.
Enhance: that is a window, which goes over the baselines in order to optimize passages, which are difficult to read. This is useful if you have “noise” in the document.
Enhance window size: this setting refers to the option just explained and therefore only needs to be set, if you would like to use “Enhance”. This setting defines the size of the window.
Sauvola enhancement parameter: please stick to the default setting here.
Line height: value in pixels; if you need to increase the pixels of the images you can do this here. 100 is a good value to go for. Attention: if the value is too high it might lead to a “out of memory order”. You can bypass this error in turn by lowering the value of the “batch size” (top left in the advanced parameters window), e.g. by half. Please be aware that the lower this value the slower the training will be. The slow-down of the training relating to the batch size should be improved with the new version of PyLaia, which will set the batch size automatically.
Line x-height: this setting applies to the descenders and ascenders. If you put this value, the „Line height” parameter will be ignored.
Please don’t change the following parameters:
Features surrounding polygon
Features surrounding polygon dilate
Left/right padding: 10 (default) means, that 10 pixels will be added. This is useful if you are worried, that parts of the line could be cut off.
Max width: maximum of width that a line can reach, the rest will be cut off. 6000 (default) is already a high value. If you have huge pages, you can further increase this value.
For all those, who are familiar with machine learning and the modification of neural nets. Therefore, these parameters are not further explained here.
Batch size: number of pages, which are processed at once in the GPU. You can change this value by putting another number.
Use_distortions True: the training set is artificially extended in order to increase the variation of the training set and in this way make the model more robust. If you are working on even writing and good scans, you don’t need this option. To deactivate it, please write „False“ instead of „True“.
The validation set will be saved to the collection, from which the training has been done, that’s also where the recognition is processed. After the automated recognition, you can measure the accuracy of your model with the “Compute Accuracy” function, which you can find within the “Tools”-tab.
We would like to thank the many users who have contributed their feedback to help improve the Transkribus software.No Comments