The excitement is building – it’s nearly time for this year’s International Medieval Congress in Leeds. The READ project will be presenting a panel on the morning of Monday 3 July to show that yes, Handwritten Text Recognition, can even work on medieval documents! Scroll down for full details of the panel, including abstracts of the papers.
We are also hosting a separate workshop at the University of Leeds on Wednesday 5 July for anyone interested in learning more about the technology – please email Tobias Hodel for details.
Monday 3 July, 11:15am, Session no. 139. The Digital Scribe: Handwritten Text Recognition (HTR) of Medieval Documents
Abstracts of the papers:
Elena Muehlbauer (Passau Diocesan Archives), From Tables to Transkribus. From information to knowledge. Working with parish registers. [Change to the scheduled programme]
The Diocesan Archives of Passau preserve more than 800,000 pages of parish registers. Those pages tell their readers about the important stages in life – birth, marriage and death – of catholics all over Bavaria and Austria. Those facts are highly revealing for genealogists but also for social historians who wish to understand the development of modern life. With technology available in the Transkribus platform, we are now able to gain access to a selection of registers that are written in a very specific way: tables and forms given to the priests by the newly founded state. We are currently working on an engine that will extract information out of words automatically. With the help of Transkribus, data transforms into information – and from information into knowledge.
Maria Kallio (National Archives of Finland), Transkribus and the Archives of a Brigittine Monastery: Making Digital Editions of Naantali Documents
In summer of 2016 the National Archives of Finland started a project in order to make new editions of medieval charters originating from the Brigittine Monastery of Naantali. The goal of the project was to make new editions of 136 documents and publish them in digital form in the Diplomatarium Fennicum database. Because there were several researchers working on the project, there was a serious need for a flexible platform where the co-operation would be easy to implement. Since an advanced transcription undertaken in Transkribus can be used as a basis for digital edition, the project chose to work with this platform. The presentation describes the workflow and project results, without forgetting the challenges or insights that have taken place during the project.
Tobias Hodel (State Archives of Zurich), Sending 15th-Century Missives through Algorithms: Testing and Evaluating HTR with 2,200 Documents
Is it possible to teach algorithms to read medieval handwriting? Does it make sense to have the material prepared by students, learning to read Gothic writing at the same time? Those two simple questions lay the groundwork for a discussion of how and whether handwritten text recognition and teaching of the Middle ages can be intertwined.
The material to address the tasks consist of 2,200 missives from Thun, a small town in Switzerland. 120 documents were transcribed and used for training. In the process three difficulties had to be identified: different and changing hands, difficult layout structures, and abbreviations. The identified difficulties are typical for such an endeavor. Unfortunately the results of the recognition are insufficient and can only be used cautiously by scholars. The ‘small’ amount of material for training is a reason for the poor levels of recognition. Using language models, the results can be improved, although crucial parts such as names and verbs still remain only partially identifiable. At the same time the combination of teaching and the use of cutting-edge technological tools proved engaging. The students involved were highly motivated and welcomed the possibility to take part in a digital research endeavor.