>

Text Region

Text Region

To generate a HTR transcript you need to segment your documents into text regions, lines and baselines. By default, a Text Region is a rectangle, encasing all of the handwritten text contained in the image. It is however possible to edit a Text Region according to the general layout, by adding Control Points, creating thus a polygon.

Usually, the automatic CITlab Advanced Layout Analysis in its standard setting will recognize a single Text Region on an image with the corresponding baselines. 
However, there are also layouts where the use of several Text Regions is recommended, e.g. if there are marginal notes or footnotes and similar recurring elements. As long as these text areas, which differ in content and structure, are contained in a single Text Region, the layout analysis simply counts the lines from top to bottom. This Reading Order does not take into account where a text actually belongs in terms of content (e.g. an insertion), but only where it is graphically located on the page. Correcting an automatically generated but unsatisfactory Reading Order can be time-consuming. The problem can easily be avoided by creating several text regions in which the related texts and lines are well kept like in a box.

Figure 1 Layout Analysis – Find Text Regions
Figure 2 Layout Structure
FIgure 3 Text Regions in Document

Get started with Transkribus

Make your historical documents accessible