We can all agree that it’s nice to share – and in the READ project, sharing data brings direct benefits for the Handwritten Text Recognition technology in our Transkribus platform. According to principles of machine learning, the more images and transcripts that are submitted to us as training data, the stronger the Handwritten Text Recognition technology can become. Images and transcripts are not publicly shared but they contribute to a general improvement in the technology behind the scenes.
Transcribimus is a community project based in Vancouver, Canada with a sizeable collection of transcripts which they will be using to train an Handwritten Text Recognition model.
Transcribimus all started when Sam Sullivan, former mayor of Vancouver, started to research the City Council minutes from the late nineteenth century with a view to exploring the achievements of Vancouver’s second mayor, David Oppenheimer. Sam’s physical limitations prevented him from visiting the archives as often as he would have liked. So he formed a partnership with Margaret Sutherland, a local retiree who had experience of genealogy and reading old handwriting. Margaret began transcribing and digitising the minutes for Sam and was gradually joined by other volunteer transcribers including Christopher Stephenson, a graduate student in Library and Archival studies who provided lots of assistance. Transcribimus eventually became an online platform where more than 20 volunteers have transcribed some 3,500 pages of handwritten minutes.
These transcriptions are already freely available on the Transcribimus website. The City of Vancouver Archives will ultimately display the images and transcripts on their website too.
The vast majority of the minutes are written in one hand, so these images and transcripts will likely feed into a strong Handwritten Text Recognition model that produces useful transcripts of the collection. Transcribimus volunteers could then check and correct any errors in these automated transcripts – and the transcription of the City Council minutes should hopefully be realised more quickly!
- Do you have existing transcriptions that you have produced or collected as part of a research project? Ideally 500 pages or more…
- Send them to us and we can process them and train a model to recognise the writing in your documents!
- To find out more about working with existing transcripts, consult our How to Guide or contact us.