The largest, and richest, collection of documents kept inside the Amsterdam City Archives are no doubt the Notarial Archives, starting from 1578. It spans over 30.000 volumes and after 8,6 millions of scans, digitization is only halfway. Handwritten Text Recognition would be revolutionary in unveiling its content. But would it prove to be useful for such a huge and varied collection, containing documents in all formats, spanning over several centuries, sometimes damaged by fire or water, and, not in the least, written by hundreds of different hands? Transkribus gave us the option to grow and experiment.
The citizen-science project Crowd Leert Computer Lezen on the platform ‘VeleHanden’ combined the powers of the crowd with the HTR tooling of Transkribus. The volunteers provided 10.000 scans of Ground Truth transcriptions in a very short time. We were then able to train notary-specific models (works for selected sets), we combined data in general models (very satisfactory for 18th c. documents) and also combined them with existing basemodels (even better!).
The next step was processing HTR in large quantities and make them available to the public. The first few hundreds of thousands of pages are now searchable thanks to the Transkribus Read&Search interface. Even if the HTR is far from perfect, the advanced search options for fuzzy search with high tolerance reveals many treasures.
As Amsterdam was (and is) an international hub, the HTR results are of interest to scholars worldwide. Within hours, they discovered new details of a mixed and growing population in long-gone neighbourhoods, proof of new global connections, the provenance of numerous european paintings, and even a Surinam-Indian chief was found to be inside an Amsterdam notary office in 1682. See Alle Amsterdamse Akten for more stories.