+ Presenting the Noscemus-public-model

We are happy to present one of our public models, which is the “Noscemus GM v1”-model released by Stefan Zathammer as part of the Innsbruck based project NOSCEMUS (Nova Scientia: Early Modern Scientific Literature and Latin). This model can read texts set in Antiqua-based typefaces from the 16th, 17th and 18th century, outperforming most standard OCR engines. Although it is tailored towards transcribing (Neo-)Latin texts, it provides convincing results also for other languages such as French, Italian or English. The Noscemus model can therefore not only provide help for Neo-Latinists, but for all kinds of research dealing with big text corpora from the Early Modern Period.

The model is based on training data coming from the Digital Sourcebook of the project and comprises about 1,000 pages. In order to keep the model as flexible as possible, standardizations in the transcription process were kept to a minimum. Only in the following cases normalizations were made: ligatures (e.g. aeoectff) and abbreviations (e.g. -que-us-tur…mm…) were expanded, long s (ſ) was transcribed as normal s, small caps were transcribed as majuscules.

Even though the model provides already good results, the project is still dealing with a few issues: there are some remaining inconsistencies in the transcription of quotation marks and the error rate for the transcription of Greek words or passages is still high, to a smaller degree the same applies to (German) Fraktur.

We hope that the Noscemus-model will make transcription-life easier for many of you and for all those working on different kinds of documents, don’t forget to have a look at the other models we have been able to publicize recently thanks to our hard-working users. An overview about all our public models you can find in this document: https://transkribus.eu/wiki/images/d/d6/Public_Models_in_Transkribus.pdf

SHARE THIS ARTICLE

Recent Posts

September 19, 2023
Transkribus
We are thrilled to announce the September 2023 release of the Transkribus web app. After the successful switch to the ...
August 30, 2023
News, Transkribus
Today, the new Transkribus web app is officially launched!  Transkribus has always worked towards simplifying the digitasion and transcription of ...
August 21, 2023
Transkribus User Conference
The Transkribus User Conference 24 (15 & 16 February 2024, Innsbruck) invites stakeholders, users, scholars, and enthusiasts to explore the ...