Romanian Transition Alphabet

Free Public AI Model for Handwritten Text Recognition with Transkribus

Romanian Transition Alphabet

This is a first attempt on a model trained for texts in the Romanian Transition Alphabet (1830-1862).

Historical Context and Transcription Principles (by Roxana Patras)

The Romanian transition alphabet is a combination of Cyrillic and Latin characters that was used for printing between approx. 1830, after the Phanariote rulers and the establishment of the Russian protectorate (The Organic Regulation), and approx. 1862, when the first king of The Romanian United Principalities (Moldova and Wallachia) passed a law that constrained typographers to use only Latin glyphs.

Blending in a way that make Romanian printed texts — press, legal documents, and fiction —  resemble to the avant-garde Dada poetry, the Cyrillic and Latin letters are not distributed evenly across the 3 decades of transition. Set up through the typographers’ habits and practices and not through convention or linguistic standards, the rules of using Latin instead of Cyrillic are somehow discrete and totally dependent on the location of printing houses. Compared to the Wallachian ones (Bucharest), the Moldavian prints (Iasi) present certain dialectal particularities. Moreover, if we take samples randomly, we can easily notice that the amount of Cyrillic letters is slowly decreasing. However, for some consonants, chiefly palatals, both Cyrillic and Latin letters might be used simultaneously in one and the same text: K/ k  — Ч/ч (che); G/g  — Г/ г (ghe) —  Џ/ џ (Dze); S/s  —  C/c (es).

In order to train an HTR model for these texts, I have chosen 5 samples that show, before and after 1859, when the 2 Romanian provinces become a country with an official language, the progression from a massive use of Cyrillic letters to an eye-friendly employment, which makes reading more fluent.

The paratextual information characterizes the 5 texts as “original” or “historical” novels. In fact, if we take into account that the first Romanian novel was published in 1845 by a mysterious author signing D.F.B. (Elvira sau amorul fără de sfârşit. Romans originalElvira or the neverending love. Original Romance), they are regarded by the Romanian literary tradition as a sort of founding pieces:

1.     PELIMON Al., Hoţii şi Hagiul. Roman istoric, Buc., Tip. Sfintei Mitropolii, 1853, 117 p.

2.     BOERESCU Costache, Aldo şi Aminta sau Bandiţii, Buc., Tip. Bisericească din Sf. Mitropolie, 1855, 164 p.  

3.     PELIMON Al., Jidovul cămătar. Moldova şi Bucovina, Buc., Tip. Stephan Rassidescu, 1863, 292 p.

4.     PELIMON Al., Bucur, istoria fundării Bucureştilor, Buc., Tip. Nationala Iosif Romanov, 1858, 251 p.

5.     ARICESCU C.D., Misterele căsătoriei, I. Bărbatul predestinat, Buc., Tip. Stephan Rassidescu, 1861, 179 p.

As a general rule, Latin capital letters are preferred for writing titles after 1859.
The Latin letters Z/ z, M/ m, D/ d, S/ s, T/ t, N/ n, A/ a, I/ i, E/ e, O/ o, Î/ î, U/ u, Ŭ/ ŭ, Ĭ/ ĭ are present from the oldest sampled text (1853), whereas the Cyrillic Х/х (ha), Ш/ ш (sha), Щ/ щ (shcha), Ц/ ц (tze), Џ/ џ (dze), Ч/ ч (che), Ъ/ ъ (ă), П/ п (pe), Р/ р (er), Ж/ ж (zhe), Ф/ф (ef), К/ к (ca), В/ в (ve), Л/ л (el), Г/ г (ghe), Б/ б (be).
Among these Cyrillic letters, the first to receive a Latin equivalent are: Ф/ф (ef) → f;  Г/ г (ghe) → g; Л/ л (el) → l; Ж/ ж (zhe) → j. At the same time, Р/ р (er), П/ п (pe), Ъ/ ъ (ă), Ч/ ч (che), В/ в (ve), Ш/ ш (sha), Щ/ щ (shcha), Ц/ ц (tse) tend to be maintained until 1862, when some of them they are replaced with glyphs such as “ḑ” (dz), “ş” (sh) and “ț” (tz), which were imported from the Livonian alphabet but have entered the printing circuit only after 1865.

The general guidelines for transcription have been established as follows:

1.     Creation of the collection “ALFABET DE TRANZITIE” containing 6 items.

2.     Random transcription of initial, middle, and end pages:

2.1.  PELIMON Al., Hoţii şi Hagiul. Roman istoric (1853): pages 6, 7, 8, 52, 53, 54, 76, 77, 78, 79, 80.

2.2. BOERESCU Costache, Aldo şi Aminta sau Bandiţii (1855): pages 8, 9, 10, 36, 37, 38, 114, 115, 116, 148, 150.

2.3. PELIMON Al., Jidovul cămătar. Moldova şi Bucovina (1863): pages 5, 6, 7, 8, 49, 50, 51, 100, 101, 102.

2.4. PELIMON Al., Bucur, istoria fundării Bucureştilor (1858): pages 5, 6, 7, 84, 85, 86, 87.

2.5. ARICESCU C.D., Misterele căsătoriei, I. Bărbatul predestinat (1861): pages 1, 2, 56, 57, 58, 97, 98, 99, 133, 135, 136.

3.     Transliteration one-on-one of all Cyrillic letters excepting the situations when K/k stands for the group Ch/ ch (e.g. Бukete → Bukete):

Х/х → H/ h; Ш/ ш → Ș/ ș; Щ/ щ → Șt/ șt; Ц/ ц → Ț/ ț, Ч/ ч → C/ c;

Ъ/ ъ → Ă/ ă; П/ п → P/ p; C/c → S/s;  Р/ р → R/ r; Ж/ ж → J/j; Ф/ф → F/ f;

К/ к → C/c; В/ в → V/ v; Л/ л → L/l; Г/ г → G/ g; Б/ б → B/ b; Џ/ џ → G/ g.

4.     Customization of the following glyphs:

apostrophe, right double quotation mark, double low-9 quotation mark, Ŭ/ ŭ, Ĭ/ ĭ, á.

Model Overview

Name:
RTA2 (Romanian Transition Alphabet)
Creator:
Roxana Patras
Model ID:
51515
Century:
19th
Languages:
Romanian
Script:
Romanian Transition Alphabet
Engine:
PyLaia
Material:
Print
CER on validation set:
2.80 %
Simply upload a picture and test this model

By uploading an image, you accept our terms and conditions and our privacy policy

RTA2 (Romanian Transition Alphabet) is freely available to everyone

Get started with Transkribus and use it for your own Material
You can use this model to automatically transcribe Print documents with Handwritten Text Recgnition in Transkribus. This model can be used in the Transkribus Expert Client as well as in Transkribus Lite.
This AI model was trained to automatically convert text from images of historical Romanian Transition Alphabet documents into editable and searchable text.