My READ/Transkribus Story: Tobias Hodel

In 2016 I joined the READ project for the state archives of Zurich. Within the large project, I became part of the dissemination working group and responsible for the alignment of more than 100’000 pages of handwritten minutes from the Zürich executives of the 19^th century. Thanks to READ, I could not only travel Europe & the US for more than 50 Transkribus related workshops and talks. Furthermore, I got in contact with numerous scholars, archivists, librarians, and scientists trying to get the most out of HTR, KWS, (semantically enhanced) Layout Analysis, and much more. I had the privilege of seeing written cultural heritage in its incredible variety and discussing its specificity with experts.

One of the consequences of using, thinking, and talking about machine learning daily was to encounter this approach and its advantages and problems in-depth and shape my research accordingly. The result of my usage of Transkribus was thus not only hundreds of HTR+ and PyLAIA models as well as the preparation of thousands of pages of Ground Truth (see, e.g., the public model StAZH_RRB_German_Kurrent_XIX based on 26 million words). It’s rather the insight that it’s our duty as scholars to use and critically analyze deep learning not only to make cultural heritage accessible but help understand the technology and its pitfalls for our future benefit.

Regarding Transkribus, I understand the platform as ready to use if several hundreds of images need to be processed and a stable environment is essential. For a scholarly edition project (koenigsfelden.uzh.ch), we used Transkribus as our hub for transcriptions, resulting in some HTR models as a by-product. At the end of my tenure at the state archives of Zürich, we started a variety of projects building on HTR+ and p2pala to prepare vast amounts of pre-modern text and use semantic annotations to speed up archival indexing. For the whole GLAM field, I believe this is the way to go.

In 2019 – in no small part thanks to the success of READ – I was offered a tenure track position at the University of Bern with the task to provide the faculty with approaches to digital humanities. Since then, I have been using Transkribus in teaching and currently think about the next steps in text annotation, including Named Entity Recognition (esp. for historical languages) and Content Extraction (e.g., using Topic Modeling).

Want to know more? I published in German and English about Transkribus, HTR, and consequences of the use of machine learning in the humanities (besides some stuff about the Middle Ages 😉

See my page at the University of Bern here, and my ORCiD profile, or follow me on Twitter.

Cookie	Description	Duration
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.	1 hour
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.	1 year

Cookie	Description	Duration
VISITOR_INFO1_LIVE	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.	5 months
IDE	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.	2 years

Cookie	Description	Duration
GPS	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location	30 minutes
tk_or	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	5 years
tk_r3d	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience	3 days
tk_lr	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	1 year
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.	2 years
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.	1 day
matomo	For statistical analysis, we use “Matomo” on this website. This is an open source tool for web analysis. Matomo does not transmit data to servers outside the control of the READ-COOP. Matomo is deactivated when you visit our website. Only if you actively consent will your usage behaviour be recorded anonymously.	1 year

Cookie	Description	Duration
YSC	This cookies is set by Youtube and is used to track the views of embedded videos.	1 year
_gat	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.	1 minute

My READ/Transkribus Story: Tobias Hodel

The COOP

Products & Services

Useful information

Helpful resources

Community