The “NOSCEMUS General Model” is tailored towards recognizing Latin prints from the early modern period. Although the model is designed to recognize Latin prints set in Antiqua-based typefaces, it is also capable of recognizing passages in Greek and passages set in (German) Fraktur.
In creating the Ground Truth the following transcription guidlines were followed:
– ligatures (e. g. Æ or æ, Œ or œ) and standard abbreviations (e.g. -que, -us, -tur, …mm…, …nn…) have been expanded
– long s (ſ) was transcribed as a normal s
– small caps were transcribed as majuscules
– special characters and diacritics (e. g. &, ë, ï or ę) were kept
The model was released by Stefan Zathammer and it is based on training data coming from the Digital Sourcebook of the NOSCEMUS project (https://transkribus.eu/r/noscemus/#).
If you use the Noscemus model as a base model for your own model, or if your edition is based on a transcription made with the help of the Noscemus model, you are kindly requested to mention the Noscemus model.
The NOSCEMUS project (https://www.uibk.ac.at/projects/noscemus) has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 741374).