Success Story
Published: 1 year ago

How the State Archives of Zurich published 50,000 pages with read&search

The State Archives of Zurich is the central archive for the Canton of Zurich, Switzerland. They are responsible for storing and preserving documents from councils, courts, administrations, and many other public bodies in the canton, giving a rich insight into the last 1150 years of Zurich’s history. 

The State Archives of Zurich is one of the largest archives in Switzerland. © Staatsarchiv Zürich

But naturally, you can only access these insights if you can access the documents themselves. And due to the sheer number of documents in the archive, this is not always easy to do. In the past, there was no way to quickly search whole premodern collections for relevant documents, meaning you would have to search through shelves and boxes, trying to find the papers you need. 

To solve this problem and make it easier to access premodern documents, the archive is currently using Transkribus and read&search to create digital versions of their collections. We spoke to Christian Sieber, who leads the digitisation team at the Zurich archive, about their first Transkribus project and their experience with the software.

50,000 pages of meeting minutes transcribed

The State Archives’ history with Transkribus goes back many years—they were one of the partners in the original READ research project, which ran from 2016 to 2019. Back then, the Zurich archive mainly did manual transcription, which is much less time-efficient, and they immediately saw the benefit of automatic transcription. “In the past, we used to transcribe handwritten texts manually – and we know how much effort it takes,” he explained. “So it was clear to us early on that we would want to use Transkribus for our own projects in the future.”  

The archive is home to documents from the last thousand years of Swiss history. © Staatsarchiv Zürich

By 2019, the archive was ready to start their first large-scale Transkribus project. The documents chosen for this were meeting minutes from the Zurich Council in the 18th century, as they provide insights into almost every interaction between the city council and its inhabitants at that time. However, as this collection was over 50,000 pages, it was going to take some work. The first thing the team did was create an AI model. They manually transcribed 203,189 words and got down to an CER of just 4.80% on the Train Set. This created a solid foundation for transcribing the rest of the documents. 

Like with any large-scale project, there were some hiccups along the way. “One of the challenges in the texts were marginalia, whose layout was not always correctly recognized,” Christian explained. “Another was the various writers who worked on the protocols. But with Transkribus, we were able to process over 50,000 pages of Zurich Council minutes in just three years, which would never have been possible with manual transcription.”

Making the collection digitally available

Of course, transcribing is only half the story. To make collections like these accessible to researchers and the public, the transcriptions need to be published in an online database—preferably one that is both reliable and easily searchable. For the State Archives of Zurich, the best option for this was to use Tranksribus’ sister platform: read&search.

“It was important for us to have a standard solution, which is used by many other projects and will be continuously developed,” Christian said about their decision-making process. “With read&search, READ-COOP offered a publishing solution that convinced us.”

As the archive had already done all their transcriptions with Transkribus, their digital collections could be simply uploaded to read&search, along with any metadata and tags that had been assigned during transcription. That means that, for example, if someone wants to find all council meeting records involving a certain council member, they can simply search for that person’s name and find the records they are mentioned in. 

“In the past weeks, we have already received positive feedback from many researchers. Some have suggested to manually correct the texts with Citizen Science and thus further improve them.”

The future looks bright

This project is just the first of a few that the archive has planned. Starting next year, the team will begin digitising the Zurich Council minutes from the 15th to the 17th centuries, creating an online database of nearly 400 years of council documents. Over the next few years, the plan is to make all of the archive’s key documents freely accessible online, so that researchers and other interested parties can find the information they need as quickly and as easily as possible.

“By digitising our collections, we are creating an invaluable resource which meets the great demand from researchers from all over the world interested in the history of Zurich.”

Christian’s Transkribus advice

“Make sure you have a good overview of the texts you want to digitise. This makes it easier to plan everything to meet your requirements, and not waste time and money along the way. In short: the better you know your texts, the better you can plan the project.”

The documents at the archive provide a wealth of insights into the history of Zurich. © Staatsarchiv Zürich
Overview