Creating a digital scholarly edition of the Lovelace papers with Jessica Cook

Mathematics, computing, music, poetry: Ada Lovelace was a person of many talents. The 19th-century mathematician — and daughter of infamous Romantic poet, Lord Byron — is widely regarded as one of the pioneers of modern computing thanks to her work on Charles Babbage’s “Analytical Engine”. One of Lovelace’s most groundbreaking ideas was that the machine could be used for more than just arithmetic: something that, in today’s world of smartphones and servers, turned out to be very true indeed.

However, despite Lovelace’s technical achievements, we know surprisingly little about her life. While there is a wealth of papers and documents stored in the Lovelace archive, much of it is inaccessible, making it difficult to conduct scholarly research on the sources. But that is about to change.

Jessica Sherrill (Cook), a PhD candidate in the English department at UCLA, is currently transcribing the entire Ada Lovelace archive to uncover more about the life of this amazing mathematician. She hopes that by making it easier to access Lovelace’s personal writings, we will have a better insight into Lovelace’s thinking, and into how her poetry influenced her revolutionary ideas about computing. We had a chat with Jessica to find out more about this unique research project and her experience with Transkribus.

Accessing the Ada Lovelace Archive

There hasn’t always been so much interest in the life and work of Ada Lovelace. In the past, much more attention was paid to her male collaborator, Charles Babbage, with Lovelace seen as a mere assistant. “Many accounts portrayed Lovelace as a naive and somewhat star-struck young woman with no advanced mathematical knowledge and little true visionary insight,” Jessica explained. “But beginning in the late 1970s, a number of studies began to delve more fully into Lovelace’s intellectual legacy. Together they convincingly demonstrate that she did indeed possess the requisite technical knowledge to significantly shape the conceptualization of the Analytical Engine.”

The Bodleian Library at Oxford University is responsible for preserving the entire Lovelace collection. @ subherwal, distributed under a CC-BY 2.0 license

Yet despite this increased interest in Lovelace’s work, scholars still struggle to get a full picture of her life. This is not due to a lack of historical sources. At the Bodleian Library in Oxford, there is a large collection of letters, notes, and other papers all written by and to the remarkable mathematician, providing a wealth of insights into her life. But accessing this rich collection remains a challenge. As the Lovelace archive has never been published in a full scholarly edition, researchers find it hard to conduct effective research on it. “Most scholars working on Lovelace today only have access to an incomplete and often misleading portrait of her life,” Jessica explained. “Until this gap is rectified, our knowledge of this remarkable woman will always be necessarily incomplete.” Convinced of the potential of the Lovelace archive, Jessica boldly decided to rectify this gap herself and create a fully transcribed digital version of the Lovelace archive.

How to transcribe 14,000 pages

Of course, digitising an archive on this scale is no small feat. The Lovelace archive contains around 14,000 pages, which would take a solo researcher like Jessica years to manually transcribe. Therefore, she started looking for a more time-effective method and found she had two main options. “The first was to crowd-source, which has been successfully used by digital humanities projects like the Dickens Code Project ,” Jessica explained. “There are some incredible benefits that can come from this kind of public-facing model, but it also comes with some significant pitfalls.”

Crowd-sourcing is quite a resource-intensive method. It requires materials to be professionally digitised, it involves a lot of outreach and training and you have to cooperate with many team members of varying levels of involvement and expertise. “All these tasks cost time and money. As such, this kind of model may not be the best fit for graduate students, postdoctoral students, or others who operate outside the auspices of tenured academia.”

Jessica’s second option was to use software to do the transcription automatically. After much searching on academic blogs and forums, she came across the Transkribus handwriting recognition software. Tranksribus uses AI models to turn the text in handwritten documents into digital text, which can then be stored as a database and made searchable through tags and metadata. “By training AI models to perform much of the actual transcription work, I can use the keyword search feature to find material specific to my immediate dissertation research goals, instead of relying on guesswork for where relevant material might be located.”

The fact that Transkribus was developed by a team of academics who understood the process of historical archive research was also a key factor for Jessica, as were the scholarship options available to her. The Transkribus Scholarship Programme awards free credits to young researchers looking to use Transkribus for their project, and due to the academic importance of Jessica’s project, she was granted enough credits to cover the transcription of all pages. “Thanks to the scholarship, I will be able to use Transkribus to transcribe the entirety of my collection.”

Creating the Lovelace AI model with Transkribus

Jessica’s first step in the transcription process was to learn exactly what is possible with Transkribus. “I would urge anyone starting a Transkribus project to take the time to fully read the website and listen to all the recorded Q&As on YouTube,” she advised. There was so much relevant information available that Jessica started collating everything she read into a personal training manual for future reference. “And most importantly, I reached out to the READ-COOP team a lot with questions, and they have been very helpful in answering my queries.”

Armed with her newfound knowledge, Jessica then set about training her AI model with Transkribus. So far, she has transcribed about a tenth of the collection and has been very happy with the initial results. But in some ways, working with the software was different to how she expected. “Initially, I wanted to get the Lovelace model to perform almost as well as I could read and transcribe. As my work has progressed, my view of this joint operation has shifted. While the Lovelace model sometimes misses words or letters that seem relatively easy to the human eye, many times it has recognized words that both previous scholars and I have incorrectly deciphered. This process has truly been one of collaboration, where the human and the machine bring different expertises.”

Training the Lovelace model correctly is particularly important in this stage of the project, as it sets the foundations for future project stages. The Lovelace archive contains not only papers from Lovelace herself but also her husband, William King, and her mother, Lady Noel Byron, which also need to be transcribed. “I will use the Lovelace model as a base model to train additional models on the handwriting of other major contributors,” explained Jessica—a strategy that will save her both time and transcription effort in the months to come, and sets the groundwork for further research into the collection after her PhD.

A meaningful project

There is a beautiful symmetry to this project. Ada Lovelace was one of the first people to suggest that the computers of the future would be able to do much more than just calculate numbers. Almost 200 years later, Jessica Sherrill (Cook) is using computers to transcribe documents, enrich them, publish them, and by doing so, uncover more about the life of this remarkable woman.

Mathematician Ada Lovelace was one of the pioneers of modern computing. @ Alfred Edward Chalon, distributed under a CC-BY 2.0 license

“Without overestimating Lovelace’s contributions to computer science, I feel that it is appropriate to say that her intellectual openness and creativity opened a small window onto the world of miraculous digital possibility that we now find ourselves in. I am still working through the full conceptual implications of this question, but on a personal level I would like to think that this Transkribus project is a proud fulfillment of Lovelace’s early insights.”

Jessica’s Transkribus Tip

To anyone thinking about embarking on a similar project: go for it. Don’t be discouraged by those who dismiss large-scale digital projects as endeavours that are too much work to be undertaken by graduate students. It is important to know your limits, but I also think that the future of digital research hinges upon scholars who focus on solutions-oriented approaches to challenges.

Cookie	Description	Duration
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.	1 hour
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.	1 year

Cookie	Description	Duration
VISITOR_INFO1_LIVE	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.	5 months
IDE	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.	2 years

Cookie	Description	Duration
GPS	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location	30 minutes
tk_or	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	5 years
tk_r3d	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience	3 days
tk_lr	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	1 year
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.	2 years
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.	1 day
matomo	For statistical analysis, we use “Matomo” on this website. This is an open source tool for web analysis. Matomo does not transmit data to servers outside the control of the READ-COOP. Matomo is deactivated when you visit our website. Only if you actively consent will your usage behaviour be recorded anonymously.	1 year

Cookie	Description	Duration
YSC	This cookies is set by Youtube and is used to track the views of embedded videos.	1 year
_gat	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.	1 minute

Creating a digital scholarly edition of the Lovelace papers with Jessica Cook

Accessing the Ada Lovelace Archive

How to transcribe 14,000 pages

Creating the Lovelace AI model with Transkribus

A meaningful project

Jessica’s Transkribus Tip

The COOP

Products & Services

Useful information

Helpful resources

Community