SOLR Keyword Spotting | API

This search is only possible if the HTR has been post-processed (typically by UPVLC, contact info@readcoop.eu for questions)

Searching for keywords via the SOLR index can be done via GET request to

https://transkribus.eu/TrpServer/rest/keyword

with the following parameters:

query string – the keyword to be searched
start int (default: 0) – first result
rows int (default: 10) – number of successive results to fetch
- In order to process large amounts of hits, SOLR allows to define at a specific hit and show only the next N hits from there onward. This can be used to browse results page-wise (e.g. first page starts at 0 and shows 10 results, next page starts at 11 and shows next 10 etc.)
probL float – lower limit for keyword probability (usually between 0.0 and 1.0)
probL float – upper limit for keyword probability (usually 1.0)
- Each keyword is stored with a probability value. It is possible to limit searches to results above or below a certain probability. (Note: Currently, the keyword probabilities are stored directly as provided. To transform these probabilities into true relevance probabilities, a calibration function is required in the user interface.)
filter string – allows to specify certain fields and values to filter search results (can take multiple values as in …&filter=cId:1895&filter=id:4243_221_*…)
- fields to filter by are
- id: (string) index element id, consisting of document id, page number and a running number for word on the page, separated by underscores -> e.g. 4432_15_10 would be word 10 on page 15 of document 4432. Setting a filter string to 4432_15_* would limit searches to this document and page; *_20_* would limit searches to page 20 of any document.
- title: (string) title of the document
- cId: (int) collection id
- auth: (string) name of the author
fuzzy: int – takes all integer values, but SOLR currently only supports values between 0 and 2
- SOLR allows to include results that differ in a certain amount of characters.
sorting string – allows to sort by certain fields. (usually “rp desc” to show results with descending probability)

Example:

Searching for the keyword “london” in collection 1234 with any probability, displaying the first 100 results sorted by descending probability.

https://transkribus.eu/TrpServerTesting/rest/search/keyword?query=london&start=0&rows=100&probL=0.0&probH=1.0&filter=cId:1234&fuzzy=0&sorting=rp+desc

Cookie	Description	Duration
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.	1 hour
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.	1 year

Cookie	Description	Duration
VISITOR_INFO1_LIVE	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.	5 months
IDE	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.	2 years

Cookie	Description	Duration
GPS	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location	30 minutes
tk_or	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	5 years
tk_r3d	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience	3 days
tk_lr	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	1 year
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.	2 years
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.	1 day
matomo	For statistical analysis, we use “Matomo” on this website. This is an open source tool for web analysis. Matomo does not transmit data to servers outside the control of the READ-COOP. Matomo is deactivated when you visit our website. Only if you actively consent will your usage behaviour be recorded anonymously.	1 year

Cookie	Description	Duration
YSC	This cookies is set by Youtube and is used to track the views of embedded videos.	1 year
_gat	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.	1 minute

SOLR Keyword Spotting | API

The COOP

Products & Services

Useful information

Helpful resources

Community