Some people are horrified by the thought of writing notes on the pages of books. But for the English philosopher John Stuart Mill (1806 – 1873), marginal notes were a useful way to record his thoughts and observations as he read.
Mill’s collection of books is now in the possession of Somerville College at the University of Oxford. The John Stuart Mill Collection holds more than 1500 books once owned by Mill. Many of these texts contain annotations and markings made by Mill.
Somerville College, in collaboration with the University of Alabama, is currently undertaking a project to digitise and categorise this marginalia. These partners have now begun to work with Transkribus, with a view to applying Handwritten Text Recognition to Mill’s scribblings.
READ partners from Xerox Research Centre Europe and the Computer Vision Lab at Vienna Technical University are working with hundreds of images from the Mill collection. They aim to use Document Understanding to distinguish between the printed and handwritten text on the pages of these books and also use Handwritten Text Recognition to transcribe the comments which Mill wrote in the margins. Transcripts of the Mill marginalia would be an invaluable resource to scholars and would complement the forthcoming Mill Marginalia database.
This is an exciting experiment for the READ project, as the methods and results of this endeavour could be applicable to other collections where marginal annotations appear on printed texts. Many other writers, including Oscar Wilde and Mark Twain, were habitual annotators and technology from the READ project could help us to understand how they read, processed and understood books and articles.