+ Sharing data with Transkribus – Transcribimus and minutes of Vancouver City Council

We can all agree that it’s nice to share – and in the READ project, sharing data brings direct benefits for the Handwritten Text Recognition technology in our Transkribus platform.  According to principles of machine learning, the more images and transcripts that are submitted to us as training data, the stronger the Handwritten Text Recognition technology can become.  Images and transcripts are not publicly shared but they contribute to a general improvement in the technology behind the scenes.

Transcribimus is a community project based in Vancouver, Canada with a sizeable collection of transcripts which they will be using to train an Handwritten Text Recognition model.

Transcribimus all started when Sam Sullivan, former mayor of Vancouver, started to research the City Council minutes from the late nineteenth century with a view to exploring the achievements of Vancouver’s second mayor, David Oppenheimer.  Sam’s physical limitations prevented him from visiting the archives as often as he would have liked.  So he formed a partnership with Margaret Sutherland, a local retiree who had experience of genealogy and reading old handwriting.  Margaret began transcribing and digitising the minutes for Sam and was gradually joined by other volunteer transcribers including Christopher Stephenson, a graduate student in Library and Archival studies who provided lots of assistance.  Transcribimus eventually became an online platform where more than 20 volunteers have transcribed some 3,500 pages of handwritten minutes.

Image from the City Council Minutes. City of Vancouver Archives, VMA 23-5 page 214. Image credit: Margaret Sutherland.

These transcriptions are already freely available on the Transcribimus website.  The City of Vancouver Archives will ultimately display the images and transcripts on their website too.

The vast majority of the minutes are written in one hand, so these images and transcripts will likely feed into a strong Handwritten Text Recognition model that produces useful transcripts of the collection. Transcribimus volunteers could then check and correct any errors in these automated transcripts – and the transcription of the City Council minutes should hopefully be realised more quickly!

  • Do you have existing transcriptions that you have produced or collected as part of a research project?  Ideally 500 pages or more…
  • Send them to us and we can process them and train a model to recognise the writing in your documents!
  • To find out more about working with existing transcripts, consult our How to Guide or contact us.
SHARE THIS ARTICLE

Recent Posts

June 20, 2022
Transkribus
By Joe Nockels, University of Edinburgh As part of his PhD research at the University of Edinburgh and National Library ...
June 15, 2022
Transkribus, Webinars
We are excited to launch our new “Ask Us Anything” webinar series, where attendees can ask us about anything related ...
June 9, 2022
Uncategorized
The new sharing feature It might be a small button for a screen, but it is a great function for ...