If you’re new to Transkribus, you probably have lots of questions about the platform. How do I transcribe documents? What’s a model? How do I even log in?
Many of these questions can be solved by a visit to our Help Center, which contains information about every feature and function of Transkribus. However, to help you out even further, we’ve put together a list of the five most common questions asked by new Transkribus users, along with step-by-step instructions for solving them.
- How do I upload files to Transkribus?
- How do I export files from Transkribus?
- How do I check the status of a job?
- Which public model is best for my material?
- What is a CER and how can I improve the CER of my model?
1) How do I upload files to Transkribus?
The first step in the text recognition process is to upload an image of the document you want to transcribe. This can be in JPEG, PNG, or PDF format.
To upload a document, navigate to the Desk and select Upload Files. Select the collection you wish to upload files to, and then select the files themselves.
The majority of documents uploaded have more than one page. However, you do not have to upload each page individually.
If you are uploading JPEGs or PNGs, all images selected in one upload will be uploaded as one document and each image will then become one page of the document. If you are uploading PDFs, each page of the PDF is extracted and uploaded as a page of the document.
For more information about uploading documents to Transkribus, visit our Help Center.
2) How do I export files from Transkribus?
You can download or export your document images and transcriptions from Transkribus, allowing you to work with them outside of the platform.
To export documents, select the collection, documents, or pages you wish to download. Click on the three dots in the top menu and select Export. Choose an export format from the list and select Start export.
For more information about the different export formats that are possible with Transkribus, please visit our Help Center.
3) How do I check the status of a job?
A Transkribus “job” is any task you ask the platform to do, such as uploading documents, carrying out text recognition, or training a model. Depending on the size of the job, you may have to wait a short time until the job is completed. During this time, you can continue working on the platform and even close it completely and the uncompleted jobs will continue running in the background.
You can check the status of your jobs at any time by clicking on Jobs in the top right of the screen. Here you will see a full list of all your current and previous jobs, no matter if they were done in Desk or Sites. Each one is allocated a label such as Created, Running, Finished, or Failed.
If your job is labelled Created or Running, it is not yet complete. You can see under Description how many jobs are before you in the queue. This should give you an indicator of how long you will have to wait.
If your job is labelled Failed, you should first try running the job again. If it consistently fails, then you can create a support ticket and select Job failed in the Topic field. This alerts the Transkribus team to the problem.
You can cancel a job at any time by clicking the three dots under Action and selecting Cancel.
4) Which public model is best for my material?
To transcribe text in Transkribus, you need a text recognition model. The model tells the platform how to transcribe the text in your document.
The easiest way to do this is to select a public model. These are models that have been trained by the Transkribus community and made public for everyone to use. You can view all the public models on this page or in the Gallery of the Models section of Transkribus.
Each model is labelled with a language (eg. German), a script (eg. Latin alphabet), a time period (eg. 17th century), and whether it is suitable for handwritten or printed text. When choosing a public model to use with your documents, it’s important to choose one with the same or similar language, script, time period, and text type to your documents. This will give the most accurate transcriptions. It may also be worth trying out a couple of different models, to see which is the best fit for your documents.
We also have two Super Models, the Text Titan I and the Dutch Demeter I, with more currently being trained. Super Models are more powerful transformer-based models, capable of transcribing many different types of materials simultaneously. They are therefore great for collections that include a range of languages, scripts, or a mix of both handwritten and printed documents.
For more information about choosing a model, visit our Help Center.
5) What is a CER and how can I improve the CER of my model?
Every model is also assigned a Character Error Rate, or CER. This is a number between 0% and 100% which shows how accurate the model is. A model with a CER of 100% will produce a very inaccurate transcription whereas a model with a CER of 0% will give a perfect, error-free transcription. For best results, you should aim to use models with a CER of 10% or less.
With Transkribus, it is also possible to train your own text recognition model, tailored to the handwriting or print in your specific documents. If you are new to training models, you might find it difficult to produce a model with a CER of under 10%. However, with a few helpful tips, you can quickly improve the CER of your custom model.
For more information about improving the CER of your model, visit our Blog.
Have another Transkribus question?
Our Help Center is a mine of information for all things Transkribus. From creating an account to training a model and using the ScanTent, it has step-by-step instructions for all the different features of Transkribus.
We also have a Tutorials playlist on YouTube, where team members walk you through all the tasks you can do with the platform, enabling you to follow along in real-time and get the most out of the platform.