sudo dnf install webkitgtk
and unpack it into the Transkribus folder – the Transkribus.command file will automatically check for java installations in its sub directories!
If you encounter mysterious error messages like “already connected”, maybe your java is not up to date. Please try to update Java (https://www.java.com/de/download/) and try again.
For Mac users – if the integrated java version is outdated, try to download the latest version of Transkribus from our homepage and replace the complete installation on your computer.
[Pro tip: the Mac version of the expert client comes with a java shipped within the application. If this java version is outdated, you can try to delete or replace it with an updated version. To find the files in Mac finder, right-click (or cmd-click) on the Transkribus application in your programs view, click “show packages contents” in the context menu, then go to the subfolder “Contents/MacOS”. There, the subfolder “jre” contains that java version. If you delete this folder, the application starter will try to find java on your system.]
On newer MacBooks (2016 onwards), there seems to be a problem when starting Transkribus directly out of the Downloads folder – try copying the application to a folder where you have full system rights (e.g. the applications folder) and start it from there
If your starting problems persist, here you can find a workaround to start the app from your terminal
Logging in to the server is not possible via Transkribus, but on the website it works.
Java Heap space / No more handles
Logging in is prevented by the Firewall of your Internet Provider
Norton Antivirus detects a threat and is blocking the zip file from being unpacked.
Versions older or equal than 0.6.5 cannot update (very long error message):
If you do not know if you have to use a proxy server and you get “Login failed: already connected” as error message when trying to log in then that’s most likely the indication for it.
java -Dhttps.proxyHost=<proxyserver> -Dhttps.proxyPort=<proxyPort> -Dhttps.proxyUser=<user name for proxy> -Dhttps.proxyPassword=<password for proxy> -jar Transkribus-0.7.0.jar
However, editing this file will be necessary on each update of Transkribus.
sudo apt install libwebkitgtk-1.0-0
Our long-term goal is to train so many different writing styles that Transkribus will be able to deal with most handwritten documents without prior training. The more users work with Transkribus for their transcription, the faster we will reach this ambitious goal!
Yes, we now have a Text2Image matching tool that can match existing text with an image. If you have lots of existing transcripts and would like to use these to train a HTR model, please consult our How to Guide.
No! Documents uploaded to Transkribus are private by default. You can use the “Manage collections…” button in the “Server” tab of Transkribus to allow specific users to view and/or edit your collection if you wish.
In theory, yes! The software needs to be trained to understand each style of handwriting. Every piece of training data submitted to Transkribus is helping to strengthen the overall accuracy of the HTR.
Both technologies are very similar, but OCR is already in an advanced state, whereas HTR is still in an early phase. Unlike OCR, HTR does not focus on individual letters. Instead, it scans and processes the image of entire lines and tries to decode this data. The main difference from the user’s point of view is that the stage of Layout Analysis/Segmentation is integrated into the OCR engine, whereas it is a separate step in the workflow for HTR.
The accuracy of HTR is not complete but impressive Word and Character Error Rates are possible. The latest experiments have generated transcripts with a Character Error Rate of around 5%. This means that 95% of characters in an automatically-generated transcript would be correct. For some successful examples of HTR, have a look at our Example Documents or our Success Stories from the READ project blog. You can measure the accuracy of your HTR model in Transkribus using the ‘Compare’ function in the ‘Tools’ tab.
You can use your HTR model to automatically generate transcripts of your documents by clicking the “Run text recognition” button in the “Tools” tab in Transkribus. You can export your documents and search them in Transkribus by clicking the “Search” button in the Main menu. You can now also search your documents using our new Keyword Spotting tool.
You should then contact the Transkribus team by email firstname.lastname@example.org. They can activate the training button in Transkribus for you. This way you can create a HTR model which is specific to the collection of documents that you have been working with in Transkribus. Find out more in our How to Guide
The more training data, the better! But you can start to train the HTR with as little as 75 pages (15,000 words) of documents written in a neat hand.
Firstly, you need to upload your documents to the platform. Secondly, you need to segment the pages of your collection into text regions and baselines. Thirdly, you need to transcribe each page as accurately possible. For more information on these stages, have a look at our How to Guides.
HTR engines cannot process text straight away – they need to be trained to recognise a certain style of handwriting. This can be achieved by creating at least 75 pages (15,000) of training data (images and transcripts) in Transkribus.