In 2019 the KB National Library of the Netherlands welcomed the Researcher-in-Residence project Entangled Histories Ordinances of the Low Countries. Within this project, Annemieke Romein, Sara Veldoen and Michel de Gruijter studied early modern legislation, regarding volumes of printed texts. Transkribus was used in this project in order to make early modern printed texts (e.g. Dutch Gothic) readable.
The 108 volumes that were used within Entangled Histories contained thousands of rules from the early modern era. As the indexes of the various volumes were created with different standards and keywords, searching through the texts can be challenging if they need to be compared. Hence, Entangled Histories aimed at dissecting the individual texts and consequently categorise them according to a controlled vocabulary. As segmentation of texts is a field that is heavily under development, a couple of tests on what would work had been run through. In order to automatically categorise the legal texts, the Finnish tool Annif had been used. This tool has various back-ends that enable a range of options for automatic categorisation. As the project used a controlled, hierarchical vocabulary, they had to create a SKOS for this specific project. Though Annif had not been used at hierarchical structures before, they got some excellent results even though they ended up using only 400 texts in the case study.
Such a combination of techniques – recognition, segmentation and categorisation – could be interesting to also to other projects and archives to be able to quickly metadata individual texts. If you want to know more, you can check out these recent publication in the DHBenelux Journal (OA) called: The Datafication of Early Modern Ordinances.