This model is trained on the Russian part of bilingual Evenki/Russian manuscripts by Konstantin Rychkov dated 1911-1913, following pre-reform Cyrillic orthography. These are texts of various genres (folklore, personal narratives, shamanistic rites) collected in different Evenki dialects and translated into Russian. The training set comprises 581 half-pages (59300 words) from several folders of the Rychkov archive. HTR+ engine, 120 epochs.
The model is provided by the INEL project (“Grammatical Descriptions, Corpora, and Language Technology for Indigenous Northern Eurasian Languages”), https://inel.corpora.uni-hamburg.de/.
The archive of Konstantin Rychkov is preserved at the Institute of Oriental Manuscripts RAS, http://www.orientalstudies.ru/. A lower-resolution version of selected manuscripts is accessible at http://www.orientalstudies.ru/rus/index.php?option=com_content&task=view&id=10326&Itemid=149