Discovering a lost play by Lope de Vega: Álvaro Cuéllar

When Álvaro Cuéllar set out to transcribe a series of theatrical works from the Spanish Golden Age, he hoped he would find something interesting. But he did not expect to discover a completely new work by one of Spain’s most famous authors, Félix Lope de Vega y Carpio.

A prolific playwright, novelist, and poet, Lope de Vega was a leading figure in Spain’s Golden Age era. His plays include the renowned El acero de Madrid (The Steel of Madrid), El perro del Hortelano (The Gardener’s Dog), and La viuda valenciana (The widow from Valencia). The discovery by Cuéllar and his colleague, Germán Vega, adds a brand-new work to this list: La francesa Laura (The Frenchwoman Laura).

News of the find spread quickly. “Artificial intelligence attributes an anonymous work from the National Library’s manuscript collection to Lope de Vega,” reported El Pais, followed by similar articles from The Guardian, CNN, and a host of other media outlets. Everyone wanted to know more about how it was possible to make such a discovery using just the power of AI.

*Lope de Vega was one of the most prolific playwrights of the Spanish Golden Age. © BNE*

And they are about to find out. We sat down with Álvaro to discover how the team went about digitising such a vast collection of manuscripts and about finding a long-lost play by Lope de Vega.

Authorship in the Spanish Golden Age: the ETSO project

Meet Álvaro Cuéllar. In his position at the University of Vienna, Álvaro researches literature from Spain’s Golden Age: a period in the late 16th and 17th centuries known for its high artistic activity and achievement. However, it is also a period plagued with authorship problems. There are many manuscripts from the era buried in libraries and archives, yet to be attributed to a particular writer or poet.

*The manuscript was part of an anonymous collection at the Spanish National Library. © BNE*

Álvaro’s project, ETSO — Estilometría aplicada al Teatro del Siglo de Oro (Stylometry applied to the Golden Age Theater) — aims to shed new light on these authorship problems. Together with colleague Germán Vega García-Luengos from the University of Valladolid, Álvaro analyses Golden Age theatrical manuscripts and compares the results against a corpus of works by dramatists from the era. “Our aim is to restore lost or misattributed works to the author. This includes canonical authors like Lope de Vega, but also the other 350 dramatists we have in our database.”

To do this, the team uses a method called stylometry. Stylometry analyses the different aspects of a writer’s particular style, for example how often they use certain words or how many clauses their sentences tend to have. Once a stylometry profile has been created for an author, other texts can be analysed to see how closely they fit that profile, and conclusions can then be made about who wrote the text.

*The character list from the play. © BNE*

What makes the ETSO project different is that this whole process is carried out digitally. The team first creates digital versions of the prints and manuscripts with Transkribus, before using a second AI platform for the stylometric analysis and comparison. The success of this method could set a precedent for future projects of this kind.

Transcribing the manuscripts

The first step in the project was to transcribe the documents: over 1000 prints and 400 manuscripts in total. Many of these, including the Lope de Vega manuscript, came from the Spanish National Library in Madrid. “The Spanish National Library has dedicated a tremendous amount of resources to digitising its Spanish Golden Age theatre collections,” Álvaro explained. “When we approached the library, they had already digitised most of the thousands of pages we needed. The problem was that the documents were scanned, but not transcribed. That’s when we used Transkribus.”

*The team’s model both transcribed and modernised the handwritten text. © BNE*

Because the collection contained both printed texts and handwritten manuscripts, two separate models needed to be created for the transcriptions. In reality, though, Álvaro ended up creating three. “Our first model was able to transcribe Spanish Golden Age prints with incredible success (1% CER). The problem was that we needed these texts with modernised orthography. So this first model was not useful for us, because it only transcribed the original spelling of the texts.”

After a bit more research into Transkribus, Álvaro found a solution. “I realised that I could train Transkribus not only to transcribe texts but also to modernise them at the same time. This seems problematic but because Transkribus works by using groups of letters instead of individual characters, the modernisation was quite successful.”

“Combining editions of the texts and the documents, I was able to train a model with 2 million words that could transcribe and modernise Spanish Golden Age prints (3% CER) and a model trained with 3 million words that was able to transcribe and modernise Spanish Golden Age manuscripts (9% CER).”

You can find all three models on our website:

Spanish Golden Age Prints 1.0

Spanish Golden Age Prints (Spelling Modernization) 1.0

Spanish Golden Age Manuscripts (Spelling Modernization) 1.0

Analysing the stylometry

But the transcription was only half the work. Álvaro and his team also had to analyse the stylometry of the 1400 documents, to see if any of them could be attributed to authors in the ETSO database.

To do this, they used a digital tool called Stylo. “Stylo was developed by Maciej Eder, Jan Rybicki and Mike Kestemont and it is able to compare texts by their usage of words. This is extremely useful for our research and has been proven to be very effective. For example, it correctly classified 99% of the texts written by Lope de Vega in our last control experiments.”

Conveniently for the researchers, Stylo is almost as good at analysing automatic transcriptions as it is at analysing fully edited versions. “We found it extraordinary that the automatic transcriptions gave us approximately the same results as the perfectly edited texts. In the case of La francesa Laura, the relation with Lope de Vega was surprisingly strong, even with the automatic transcription.”

An astonishing discovery

Álvaro never set out to discover a play by such a famous author. But the moment he did is one he will never forget. “I was processing a bunch of manuscripts, as I do every day. Then one of these manuscripts, La francesa Laura, aligned unexpectedly with Lope de Vega in a very strong way. I sent a message to my colleague Germán Vega and I told him that we had something special, but that we had to be extremely cautious about it because it was an automatic transcription and we had to study the text carefully first.”

*The newly discovered play is entitles “La Francesca Laura”, or “The Frenchwoman Laura”. © BNE*

Studying the text took two years of meticulous historical-philological analysis. “We read the text very closely and looked for parallel expressions and ideas between this text and other works by Lope de Vega and the other 350 dramatists we have in our database. We also proceeded with metrical approximations, orthology, rhythms, thematics, dating and so on. All of them gave us the same result: a crystal-clear correlation between this play and the repertoire of Lope de Vega.”

Certain that this was, in fact, a new work by Lope de Vega, the team finally shared their findings with the world. “We did not expect such a repercussion in the national and international news. Maybe the most rewarding has been that three theatrical companies have shown interest in representing the play, which would be extraordinary.”

The project continues…

Álvaro and his team have already had a pretty amazing result, but the project doesn’t stop there. “We have to keep going with the general objectives of the project: collecting all the works of Spanish Golden Age theatre and trying to shed light on authorship problems.”

Bear in mind, too, that it took two years between the technology first attributing La francesa Laura to Lope de Vega and the researchers being sure enough about the attribution to announce it to the world. “That means that in two years, you are going to see what we are working on right now, which is also quite exciting.”

Thanks for talking to us Álvaro, and we look forward to seeing what will happen next with the project.

Cookie	Description	Duration
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.	1 hour
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.	1 year

Cookie	Description	Duration
VISITOR_INFO1_LIVE	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.	5 months
IDE	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.	2 years

Cookie	Description	Duration
GPS	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location	30 minutes
tk_or	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	5 years
tk_r3d	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience	3 days
tk_lr	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	1 year
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.	2 years
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.	1 day
matomo	For statistical analysis, we use “Matomo” on this website. This is an open source tool for web analysis. Matomo does not transmit data to servers outside the control of the READ-COOP. Matomo is deactivated when you visit our website. Only if you actively consent will your usage behaviour be recorded anonymously.	1 year

Cookie	Description	Duration
YSC	This cookies is set by Youtube and is used to track the views of embedded videos.	1 year
_gat	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.	1 minute

Discovering a lost play by Lope de Vega: Álvaro Cuéllar

Authorship in the Spanish Golden Age: the ETSO project

Transcribing the manuscripts

You can find all three models on our website:

Analysing the stylometry

An astonishing discovery

The project continues…

The COOP

Products & Services

Useful information

Helpful resources

Community