This model is based on 50+ manuscripts (ca. 500 pages) containing religious texts and travelogues in prose and verse from the 15th and 16th century. The texts are written mainly in Gothic cursive and Bastarda as well as some Textualis and Current writing. They include different Middle High and Low German dialects.
Transcription guidelines:
- abbreviations are dissolved
- s-forms are normalised
- diacritical marks are mostly kept (because of a change in the edition guidelines there may be some inconsistencies)
The Ground Truth was created by the project “Narrative Vermittlung religiösen Wissens. Edition und Kommentierung geistlicher Vers- und Prosatexte des 13. bis 16. Jahrhunderts” at the Universities Köln and Tübingen, which is funded by the DFG, and the project “Edition der deutschen Übersetzung der ‚Voyages‘ des Jean de Mandeville durch Otto von Diemeringen“ (originally funded by the DFG and Fritz Thyssen-Stiftung).
The model training was carried out in cooperation with the University Library of Tübingen and the project OCR-BW, which was funded by the MWK Baden-Württemberg.