The Bentham Project at University College London, which works on the scholarly edition of the writings of the British philosopher Jeremy Bentham, has become increasingly involved with digital humanities across the past decade. The project has undertaken the digitisation of thousands of Bentham manuscripts and in 2010 launched one of the first academic crowdsourcing initiatives, Transcribe Bentham. Exciting experiments with Handwritten Text Recognition (HTR) have also been ongoing over the past few years.
Using around 900 pages of Bentham material, a first HTR model was trained with very good results. The ‘English Writing M1’ model can recognise pages written in a relatively neat hand by Bentham and his secretaries with an impressive Character Error Rate (CER) of 5-10%. This model is publicly available in Transkribus and can be applied to English handwriting from the 1800s and 1900s with nice results.
The Bentham Project is now working to improve the automated recognition of Bentham’s most difficult handwriting – written at a time when the philosopher was in his eighties and losing his sight. Early results show a promising CER of 26%, which is a very good basis for Keyword Spotting as a research tool for scholars interested in Bentham’s ideas.
Find out more at the Transcribe Bentham blog!