The Transkribus application communicates with the server via a set of RESTful service methods.
Visualization with SWADL
- The complete definition can be found in the file: application.wadl
- Visualized with Swadl
- The parameters for all the methods described below can be found in that service description file.
Most of the methods require the user to be logged in to the services which is achieved by POSTing the user credentials to the login method:
The method returns an XML with the user profile and the collections that are allowed to be accessed by this user.
Subsequent requests to the service must then include the Session-ID from the XML either in:
- The HTTP header as cookie named “JSESSIONID”
- The request parameter “JSESSIONID”
Once a user is authenticated, the collections, for which the user has access rights, can be listed via the following calls:
The call returns a list with collections, where each object contains the collection-ID, collection name and the role of the current user in this collection.
Documents in a collection can be listed analogously via GET request:
This call returns a list of document metadata objects.
In order to retrieve a complete document, one has to GET the following:
The returned object is made up of the document metadata and the complete page list where each page contains among other attributes:
- A link to the page image
- A list of transcript files where each transcript contains:
- A link to the page XML file
- The responsible user’s ID and name
- A timestamp
- The edit status of the transcript, e.g. NEW, IN_PROGRESS, DONE, FINAL
For POSTing a new PAGE XML for a page in order to update the transcription the following path is given:
The method accepts the query params:
status: The edit status of the new transcript (see above for values)
overwrite: true or false. States if the recent version should be overwritten or not. Overwriting only works if the recent version was saved by the same user and the edit status is not “NEW”.
All processing tasks, such as document creation, layout analysis, HTR processing, OCR processing, etc., are run as threads (jobs) on the server. Each job has a status and can be monitored and cancelled. A job list can be retrieved at:
Jobs are either persistent, e.g. document creation, HTR, etc., or non-persistent, e.g. layout analysis jobs. The former are stored in the database while the latter are only kept in memory and are removed from the job list after 1 hour or if the server is restarted.
The details of a specific job can be retrieved at:
In order to cancel a job, a POST request has to be sent to the following path:
If a job comprises the processing of multiple pages, e.g. a layout analysis or HTR process, and there were problems with specific pages, details can be queried with GET requests to:
POST to this path starts an OCR job for a specific document page. the response contains the job-ID of the OCR job, which can be used to query the job status (see Jobs section).
Searching recognized text using the SOLR Index:
Parameters: query=”search string” type=”where to search” (“LinesLc” searches in recognized text, without taking capitalization into account) filter=”Filter options” (as a standard all collections with access by the user will be searched, “collectionId:XXXX” will only deliver results from chosen collection)