Success Story
Published: 2 months ago

Transcribing the personal album of Zoila Aurora Cáceres

How can a scrapbook full of notes, clippings or illustrations give us an insight into the social history of Latin American women writers? Looking at the scrapbook as more than that, the personal album of Zoila Aurora Cáceres is a personal archive and part of the cultural heritage of Peru and Latin America. And now, a team from the Laboratory of Digital Humanities (Hlab) of the Pontifical Catholic University of Peru is transcribing and cataloguing the album of Zoila Aurora Cáceres with Transkribus

That team includes Javier Vera, an expert in linguistics and mathematics; Ainaí Morales, specialized in literature; Monica Arakaki, information scientist; Mariana Reyes, historian; and Dina Cornejo, archaeologist. Together, they provided us with a deeper insight into their collaboration and the importance of this project. 

Writer, intellectual and feminist activist Zoila Aurora Cáceres

Zoila Aurora Cáceres (1872–1958), born in Lima, Peru, was an accomplished writer, journalist and social activist. Associated with the modernismo movement, she wrote essays, novels and travel literature, drawing on her own experiences and significant periods in Peruvian history, and her literary and journalistic work has been published in renowned periodicals in France, Spain and South America. But Cáceres’ impact extends beyond her literary and intellectual legacy. 

Page from Zoila Aurora Cáceres’ album, Laboratory of Digital Humanities.

Born in the late 19th century, Zoila Aurora Cáceres faced the deep-rooted gender bias and inequality of her time, which denied women access to higher education and equal civil rights. Recognising Cáceres’ achievement of earning a doctorate at the Université Sorbonne, it is worth noting that the university excluded women from attending lectures until 1880, when Cáceres was already 8 years old. It would be another 18 years before Peru’s universities opened their doors to women in 1908.

Mariana Reyes Mugaburu explains that, “Zoila Aurora Cáceres s not only a major Peruvian writer and intellectual, whose work is fundamental for rethinking the literary history and the history of ideas in the region, but also a prominent feminist activist and associationist committed to the rights and better lives of women.” Through Transkribus, the Hlab team can use the personal album as a window into Cáceres’ experiences and contributions.

Page from Zoila Aurora Cáceres’ album including her photo, Laboratory of Digital Humanities.

Zoila Aurora Cáceres’ Album

From founding several associations for women’s education and socio-economic support, campaigning for women’s citizenship and civil rights, to supporting women workers, much is known about Caceres’ social activism. “Nevertheless, there are many silences in her history that can be filled to some extent with the information contained in her album,” states Ainaí Morales. 

“Zoila Aurora Cáceres’s personal album opens up avenues for an alternative and more complex understanding of Peru’s intellectual and political history, and especially of women’s history at the turn of the century in Latin America,” Ainaí continues. The album itself is “a kind of scrapbook that contains personal information, newspaper clippings (with literary texts, reviews of Cáceres’ writings, news articles), as well as personal and professional correspondence, […] and illustrations.” In order to find a way through the complex contents of the album, it had to be structured, transcribed and categorised. 

Page from Zoila Aurora Cáceres’ album, Laboratory of Digital Humanities.

Navigating complexity with Transkribus

The album of Zoila Aurora Cáceres and its contents can be understood as a personal archive that contributes to the cultural heritage of Peru and Latin America, offering insights into the literary, cultural and social history of women writers. Working on the systematisation of this personal archive is a multidisciplinary team from the Digital Humanities Laboratory (Hlab). 


As team member Javier Vera explained, “By working with datasets, we aim to array data in such a way that it becomes easily accessible and navigable, without blurring the documents’ complex materiality”. The Hlab team makes clear that “the datasets do not aim to impose new hierarchies in the album, but to showcase its multiplicity.” To transcribe and catalogue the album, Transkribus proved to be the most efficient tool, as it is capable of transcribing text while classifying it and respecting the materiality of the document.

Page from Zoila Aurora Cáceres’ album, Laboratory of Digital Humanities.

Working with Transkribus

The 304-page album holds a variety of items like letters, photos, and newspaper clippings, meaning the Hlab team had to work with printed as well as handwritten text. In the initial research phase, the team transcribed over 50 printed literary texts using the Transkribus Print M1 model for Spanish, which was found to be the best-performing model. The second phase involved handwritten documents, such as multilingual letters. Filter and search options helped organise this large collection, and the possibility to use textual tags like location or person tags allowed the team to work on creating an interactive map of Cáceres’ journey. 

While the printed text was quite manageable, about half of the documents, including letters and postcards in five languages, required manual transcription due to their diverse nature. Another challenge was the layout of the newspaper clippings. “In this case, dealing with layouts was really challenging! Through trial-and-error, we learned to fine-tune the layout recognition and get more accurate text regions”, reports Monica Arakaki.

Recognising the challenge of handling mixed materials, Transkribus is currently developing Super Models that can handle mixed materials and, depending on the Super Model, different languages. In addition, it is already possible to train layout models such as Field and Table Models for improved layout recognition with documents such as newspapers.

Achievements and Next Steps

Since the start of the project in July 2023, the Hlab team has been cataloguing each of the documents in the album as well as transcribing them using Transkribus. The team has already completed a quarter of the content, including pages and images, and is aiming to complete the entire album. Having put considerable effort into the project, the hope is to then make the content available to the public. The easiest way to do this would be to publish the album on Transkribus Sites, as the transcription and cataloguing have already been done in the Transkribus platform. Dina Cornejo adds: “it would be very interesting to see Transkribus Sites in action.”

Team members of the Laboratory of Digital Humanities.

We wish the Hlab team all the best with their project and look forward to hopefully seeing Zoila Aurora Cáceres’ personal album published on Transkribus Sites in the future!

Thank you to Mariana and the Hlab team for talking to us and highlighting a remarkable woman in history.

Mariana’s  Transkribus Tips:

“We recommend relying on the page enumeration provided by Transkribus for a thorough cataloging of all documents, ensuring that not a single item is overlooked.”

“Initially, using AI for transcription may seem unfamiliar, but with practice, the process becomes much quicker and simpler. Gradually, one discovers the many advantages and possibilities that Transkribus offers.”

Thumbnail:  Zoila Aurora Cáceres, Laboratory of Digital Humanities.

Overview