REST API

The Transkribus application communicates with the server via a set of RESTful service methods.

Visualization with SWADL

The complete definition can be found in the file: application.wadl
Visualized with Swadl
The parameters for all the methods described below can be found in that service description file.

Login

Most of the methods require the user to be logged in to the services which is achieved by POSTing the user credentials to the login method:

https://transkribus.eu/TrpServer/rest/auth/login

The method returns an XML with the user profile and the collections that are allowed to be accessed by this user.
Subsequent requests to the service must then include the Session-ID from the XML either in:

The HTTP header as cookie named “JSESSIONID”
The request parameter “JSESSIONID”

Collections

Once a user is authenticated, the collections, for which the user has access rights, can be listed via the following calls:

https://transkribus.eu/TrpServer/rest/collections/list

The call returns a list with collections, where each object contains the collection-ID, collection name and the role of the current user in this collection.

Documents in a collection can be listed analogously via GET request:

https://transkribus.eu/TrpServer/rest/collections/{collection-ID}/list

This call returns a list of document metadata objects.

In order to retrieve a complete document, one has to GET the following:

https://transkribus.eu/TrpServer/rest/collections/{collection-ID}/{document-ID}/fulldoc

The returned object is made up of the document metadata and the complete page list where each page contains among other attributes:

A link to the page image
A list of transcript files where each transcript contains:
- A link to the page XML file
- The responsible user’s ID and name
- A timestamp
- The edit status of the transcript, e.g. NEW, IN_PROGRESS, DONE, FINAL

For POSTing a new PAGE XML for a page in order to update the transcription the following path is given:

https://transkribus.eu/TrpServer/rest/collections/{collection-ID}/{doc-ID}/{pageNr}/text

The method accepts the query params:

status: The edit status of the new transcript (see above for values)
overwrite: true or false. States if the recent version should be overwritten or not. Overwriting only works if the recent version was saved by the same user and the edit status is not “NEW”.

Jobs

All processing tasks, such as document creation, layout analysis, HTR processing, OCR processing, etc., are run as threads (jobs) on the server. Each job has a status and can be monitored and cancelled. A job list can be retrieved at:

https://transkribus.eu/TrpServer/rest/jobs/list

Jobs are either persistent, e.g. document creation, HTR, etc., or non-persistent, e.g. layout analysis jobs. The former are stored in the database while the latter are only kept in memory and are removed from the job list after 1 hour or if the server is restarted.
The details of a specific job can be retrieved at:

https://transkribus.eu/TrpServer/rest/jobs/{job-ID}

In order to cancel a job, a POST request has to be sent to the following path:

https://transkribus.eu/TrpServer/rest/jobs/{job-ID}/kill

If a job comprises the processing of multiple pages, e.g. a layout analysis or HTR process, and there were problems with specific pages, details can be queried with GET requests to:

https://transkribus.eu/TrpServer/rest/jobs/{job-ID}/errors

OCR

https://transkribus.eu/TrpServer/rest/recognition/ocr?collId={collection-ID}&id={doc-ID}&pages={pageNr}

POST to this path starts an OCR job for a specific document page. the response contains the job-ID of the OCR job, which can be used to query the job status (see Jobs section).

Search

Searching recognized text using the SOLR Index:

GET

https://transkribus.eu/TrpServer/rest/search/fulltext?query=XXX&type=LinesLc&filter=collectionId:1234

Parameters: query=”search string” type=”where to search” (“LinesLc” searches in recognized text, without taking capitalization into account) filter=”Filter options” (as a standard all collections with access by the user will be searched, “collectionId:XXXX” will only deliver results from chosen collection)

Cookie	Description	Duration
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.	1 hour
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.	1 year

Cookie	Description	Duration
VISITOR_INFO1_LIVE	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.	5 months
IDE	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.	2 years

Cookie	Description	Duration
GPS	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location	30 minutes
tk_or	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	5 years
tk_r3d	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience	3 days
tk_lr	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	1 year
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.	2 years
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.	1 day
matomo	For statistical analysis, we use “Matomo” on this website. This is an open source tool for web analysis. Matomo does not transmit data to servers outside the control of the READ-COOP. Matomo is deactivated when you visit our website. Only if you actively consent will your usage behaviour be recorded anonymously.	1 year

Cookie	Description	Duration
YSC	This cookies is set by Youtube and is used to track the views of embedded videos.	1 year
_gat	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.	1 minute

REST API

Visualization with SWADL

Login

Collections

Jobs

OCR

Search

The COOP

Products & Services

Useful information

Helpful resources

Community