+ Presenting the Noscemus-public-model

We are happy to present one of our public models, which is the “Noscemus GM v1”-model released by Stefan Zathammer as part of the Innsbruck based project NOSCEMUS (Nova Scientia: Early Modern Scientific Literature and Latin). This model can read texts set in Antiqua-based typefaces from the 16th, 17th and 18th century, outperforming most standard OCR engines. Although it is tailored towards transcribing (Neo-)Latin texts, it provides convincing results also for other languages such as French, Italian or English. The Noscemus model can therefore not only provide help for Neo-Latinists, but for all kinds of research dealing with big text corpora from the Early Modern Period.

The model is based on training data coming from the Digital Sourcebook of the project and comprises about 1,000 pages. In order to keep the model as flexible as possible, standardizations in the transcription process were kept to a minimum. Only in the following cases normalizations were made: ligatures (e.g. aeoectff) and abbreviations (e.g. -que-us-tur…mm…) were expanded, long s (ſ) was transcribed as normal s, small caps were transcribed as majuscules.

Even though the model provides already good results, the project is still dealing with a few issues: there are some remaining inconsistencies in the transcription of quotation marks and the error rate for the transcription of Greek words or passages is still high, to a smaller degree the same applies to (German) Fraktur.

We hope that the Noscemus-model will make transcription-life easier for many of you and for all those working on different kinds of documents, don’t forget to have a look at the other models we have been able to publicize recently thanks to our hard-working users. An overview about all our public models you can find in this document: https://transkribus.eu/wiki/images/d/d6/Public_Models_in_Transkribus.pdf


Recent Posts

November 17, 2022
We are thrilled to announce that yesterday, we hit 100,000 users on the Transkribus platform! Even with our years-long highly ...
August 12, 2022
Handwritten Text Recognition
Ever had trouble reading someone else’s handwriting?  Well, it may reassure you to know that it’s not only humans that ...
July 22, 2022
The latest version of Transkribus Lite is here and brings a number of new features. Here are the most important ...