In exciting news, Transkribus has started to tackle the papers of the seminal French philosopher Michel Foucault.
The team at the Foucault fiches de lecture (Foucault’s Reading Notes) project, have trained a model to recognise the philosopher’s writing with around 90% accuracy. Automated transcripts will be vital to the project’s objective: to analyse and provide online access to the large collection of Foucault’s fiches de lecture (organised citations, references and comments) held at the Bibliothèque nationale de France.
Members of the project team, Marie-Laure Massot, Arianna Sforzini and Vincent Ventresque, have written a detailed report (in French) summarising their experiments with Transkribus and explaining how they plan to make these automated transcripts available to researchers.
In an initial test, the team trained an Automated Text Recognition model by using Transkribus to segment and transcribe 200 digitised images of Foucault’s handwriting. This first model was capable of generating transcripts with a Character Error Rate of 15% (i.e. 85% of characters transcribed correctly by Transkribus).
They then extended their training set to 600 images of Foucault’s writing, which included notes relating to La volonté de savoir, part of Foucault’s four volume work on the history of sexuality. This larger set of ground truth resulted in significantly better results and they now have a text recognition model which can transcribe Foucault’s handwriting with a Character Error Rate of just 8%.
The team’s report is useful in providing an insight into the practicalities of creating training data for Automated Text Recognition. The team elaborate on the challenges of deciphering Foucault’s handwriting, where abbreviations and individual characters are often expressed ambiguously. They also describe the learning curve of working with Transkribus : it took them a few hours of intensive work to learn how to navigate the platform and around 30-40 minutes to segment and transcribe each one of Foucault’s fiches (1-2 images).
Thanks to Transkribus, the Foucault fiches de lecture project will now be able to facilitate access to a broad selection of Foucault transcripts. They plan to develop a collaborative platform using Omeka software, where Foucault specialists can consult, search, improve and annotate the transcripts of this complex and historically significant material.
Find out more:
Copyright © 2020 READ-COOP SCE