Transkribus can automatically produce transcripts of historical material with very impressive results, where 90-95% of characters in a given transcript are correct. Just take a look at the slides and videos from our recent Transkribus User Conference to see some of the best outputs generated by our users.
But the potential of Automated Text Recognition is even greater when it comes to keyword searching! Transkribus now includes Keyword Spotting technology, a sophisticated form of keyword searching based on research by the CITlab team at the University of Rostock (one of the READ project partners).
This form of Keyword Spotting is particularly useful because it can work even when there are errors in the results of Automated Text Recognition. The technology searches through the probability values assigned to characters and words during the text recognition process. Once a user enters a search query, the program searches through all possible permutations of each word on the page and returns a range of results, with some more likely to be correct than others. The users can then check and the results of the search output and decide which results to follow up.
Keyword Spotting is an amazing technological advance, with the potential to open up huge historical collections which have never been previously transcribed.
To use Keyword Spotting in Transkribus, you need to have trained an Automated Text Recognition model to recognise the documents in your collection. You can find more information about working with Keyword Spotting in our How to Guide: