Baseline

The Baseline is the most important reference point for text recognition, and describes a polyline, running along the bottom of the handwritten text line. The segmentation of a text into lines and baselines can be done automatically with the help of CITlab Advanced LA. However, with complex layouts, there might be cases where you will need to do some manual corrections. The baseline should run along the bottom of the text line, the letters should sit on it and the descenders go below. The baseline consists of individual points that you set yourself when adding manually; the setting is completed with a double-click or Enter on the last point. Baselines can also be drawn vertically. In an image and even a text region, you can also combine different line directions (e.g. the typical “postcard layout”).

If you do changes on lines, it is important to always do it on the baselines. This is important to know because, for every line in your document there is also a line region in the background. You can have a look at those by showing them with the item visibility button. These line regions must not be changed, they will be adapted automatically, when you change something at baseline level. There will be a pop-up asking you, if you would like to change the parent line as well, please confirm this.

Note: If vertical baselines are drawn manually, the drawing direction must always be in the reading direction, otherwise the HTR will not work on this Baseline later on.

Figure 1 Find Text Regions
Figure 2 Add Baselines manually
Figure 3 Layout Structure
Figure 4 Baseline under text line

Get started with Transkribus

Make your historical documents accessible