Last update of this guide 04/09/2020
This document is a basic introduction to Transkribus. It provides a simple standard workflow for working with the platform. If you need more detailed instructions on the functions of Transkribus please have a look at our How to Guides, which can be found in the Knowledge Base of the READ-COOP SCE homepage: https://readcoop.eu/transkribus/knowledge-base/how-to-guides/
Download the Transkribus Expert Client, or make sure you are using the latest version:
1 – Introduction
Transkribus can be used for several purposes. The most important are:
- Transcribe documents for a scholarly edition
- Create training data to feed the Handwritten Text Recognition (HTR+) system so it can learn to decipher your historical documents.
- Run HTR+ on your documents and receive automatically generated transcripts.
- Search for distinct words in your document collections with Keyword Spotting which is much more powerful than standard full-text search.
- The platform lives from the community. The more data uploaded to Transkribus, the more efficient the program and especially the Handwritten Text Recognition will get
Transkribus is a research infrastructure, which was established as part of the H2020 Project READ (Recognition and Enrichment of Archival Documents).
Take some time to explore Transkribus and become familiar with how it works. To make it easier we have created several How to Guides, which give instructions on the different functions of the platform. You can find them within our Knowledge Base: https://readcoop.eu/transkribus/knowledge-base/how-to-guides/
2 – To use Transkribus – register at the website
3 – Download Transkribus from the website
- Go to the Transkribus website http://transkribus.eu/ and click “Download”.
- Transkribus runs on Windows, MacOS and Linux. If you need help installing the platform, you can have a look here: https://readcoop.eu/transkribus/wiki/download-and-installation: https://readcoop.eu/transkribus/wiki/download-and-installation/
- If you use MacOS an error message may appear when you try to open Transkribus for the first time. To remedy this:
- right click the Track Pad to open the Context Menu and add a security exception for Transkribus.
- Once you have downloaded Transkribus, make sure you unzip the file. The program cannot be started from the zipped file.
4 – Open Transkribus
- Start the tool and use the “Login” button in the “Server” tab.
Figure 1 Login
- You will have access to your private collection named after your email address. This collection includes some test documents that you can experiment with.
- You can find it by clicking the “Collections” button in the “Server” tab.
Figure 2 Test documents in your collection
5 – Upload your documents
- Transkribus allows you to work with your own documents, either locally or by uploading them to the server.
- Automated processes can only be performed if the documents are uploaded to the Transkribus platform. The platform can process PDF, JPEG, PNG and TIFF files. JP2 files are not supported unfortunately.
- You can upload documents which you have scanned yourself. You can also use our DocScan app for Android smartphones to take images and upload them directly to Transkribus. For more information: https://scantent.cvl.tuwien.ac.at/en
- You may also download documents from the Internet and upload them to Transkribus. Many libraries and archives follow Open Access policies and are therefore encouraging further usage of their collection – you can ask archives and libraries directly if you can upload images of their documents to Transkribus!
- Click the “Import document(s)” button to transfer the images from your computer to the platform. Note: the images need to reside in a separate folder on your computer before you upload them to Transkribus!
Figure 3 Upload your documents to Transkribus
- You can add your documents to one of your existing collections or create a new one by clicking the “Add to collection” button at the bottom of the “Document ingest/upload” box and then clicking “Create”.
Figure 4 Add documents to one of the existing collections or create a new one
Figure 5 Create your own collection
- To access your documents, click on the “Collections” button in the “Server” tab and choose your collection. Then double-click on the documents in the box at the bottom of the “Server” tab to open them.
Figure 6 Open the documents in your collection
- All documents uploaded to Transkribus are private by default. You can give other users authorisation to view your documents if you wish. Use the “User Manager” button in the “Server” tab to add users to your collection. You can only share collections with users who have a Transkribus account.
Figure 7 “User Manager” button for managing access to your collection
6 – Segment your documents into lines
- In order to be able to feed the HTR engine with training data the documents need to be segmented into lines. This can be done automatically in Transkribus.
- Open the “Tools” tab.
- Make sure “Find Text Regions” is selected and press “Run”.
- You can choose to segment the current page or a batch of pages.
- The lines and text regions in your document will be detected automatically.
Figure 8 Segmentation
7 – Start your transcription
- Once the baselines are visible on your image you can write text into the Text Editor field.
- Click on the “Viewing Profiles” button and select the “Transcription” view.
- For each baseline, there will be a corresponding line in the Text Editor. Transcribe the text line by line, exactly as it appears in the image.
Figure 9 Transcription view
- Special characters can be found in the “Virtual Keyboards” button in the Text Editor toolbar.
Figure 10 “Virtual Keyboards” button
Figure 11 Virtual keyboards
- If you are working in a team, you might find it easier to transcribe in the Transkribus Web Interface. This is a lite version of Transkribus which is simple to use: https://transkribus.eu/r/read/projects/
8 – Save and export your transcription
Figure 12 Saving the changes in your document
- Press the “Save” button in the Main Menu to save the document in Transkribus.
- If you click on the “Versions” button in the “Server” tab, you will see that a new version has been created. This means that you can always access previous versions of a document should you need to.
Figure 13 Click the “Versions” button to access previous versions of your document
- You can also export the whole document at any point of the process by clicking the “Export document” button.
Figure 14 “Export document” button
9 – Use Handwritten Text Recognition (HTR) on your documents
- It is simple to have your documents recognised by the computer. You can start training a model with around 5,000 transcribed words of printed text or 15,000 words of handwritten text. To start the training process please drop us a short email once you have segmented and transcribed a first batch of pages (email@example.com).
- You will receive the permission to train your own model from us. If you need more information on that please check the How to Train a Model guide.
- Once an HTR model has been trained for your documents, it can be applied via the “Run” button in the “Text Recognition” section in the “Tools” tab. You can select one or more pages of your documents and start recognition.
Figure 15 Run Handwritten Text Recognition
Figure 16 Model overview and learning curve
- If you click “Run” and then “Select HTR model”, you can choose the model for the recognition and get more information about it.
- On the left side of the window you can see an overview of the available models.
- On the top right side of the window the details of the model are shown.
- The graph on the bottom right signifies the accuracy of your model with the Character Error Rate (CER), i.e. the percentage of characters that have been transcribed incorrectly by HTR. The blue line represents the progress of the training. The red line represents the progress of evaluations on the Test Set of data which was set aside during the training process.
- After the HTR has finished the results will appear directly on a new version of your document within Transkribus. It is possible to evaluate the accuracy of the automatic transcription using the “Compute Accuracy”-function in the “Tools” tab.
Figure 17 Compute the accuracy of the HTR
10 – Keyword Spotting
- Once you have a HTR model for your documents, you will be able to search them with the Keyword Spotting function. If available, of course you can also use one of the public models for this.
- First, run the HTR model on your documents to produce an automatic transcript.
- Then open the Keyword Spotting function with the binoculars button shown in Figure 18.
Figure 18 Open the “Search for…” window to use the Keyword Spotting function
- In the window which opens up choose the “KWS” tab.
Figure 19 Window to use the Keyword Spotting function
- Simply type the word you would like to search for in the “Keyword 1” box and press the “Search” button.
- A confirmation window will pop-up. Click “Yes” to start your Keyword Spotting query
Figure 20 Confirmation window
- Once your search query is finished double-click the date and numerical value in the “Created” column to access your search results
Figure 21 Keyword Spotting results
- The “Keyword Spotting Results” window will show you a list of places where that keyword appears.
Figure 22 Information about your Keyword Spotting results
We would like to thank the many users who have contributed their feedback to help improve the Transkribus software.
Consult the Transkribus Wiki for further information and other How to Guides:
Transkribus and the technology behind it are made available via the following projects and sites:
The Transkribus Platform is provided by the European Cooperative READ-COOP SCE.
Until June 2019 Transkribus was financed as part of the Horizon 2020 READ-project under grant agreement No. 674943.