P2PaLA

P2PaLA is a layout analysis tool that recognizes structure types on region level and baselines from a page based on pre-trained models.
The tool was developed by Lorenzo Quirós Díaz at the UPVLC in Valencia, see https://github.com/lquirosd/P2PaLA for the full Open Source codebase.

Recognition

Currently, the recognition is integrated into the Transkribus expert client (TranskribusX) for pre-trained models.
In this process, the P2PaLA tool creates new text-regions from trained structure types and optionally also baselines contained in those regions.
The table shows detailed information on all available models.
The column “Structure types” shows the list of region types this model recognizes and the column “Baslelines” shows if this models was also trained to detect baselines.

Parameters

Rectify regions -> all regions will be simplified to the bounding box of the actual recognized shape
Min-Area -> Shapes with an *area* smaller than this fraction of the image *width* will be removed after the recogniton. Use this parameter to remove small “garbage” regions. Default = 0.01

Training

If you are interested in training your own models, please send us an email (info@readcoop.eu) and we can enable the training interface for you.
Trained models will be associated to the currently selected collection and the owner is set to the user who has started the training.
When tagging regions, avoid overlapping between different regions.
Baseline training only makes sense if your dataset is large enough, i.e. at least 500-1000 corrected baselines.
For structure type recognition, a training set of about 50-100 pages should be enough to generate a decent model, depending of course on the complexity of your layouts.
Dataset balancing: it is always a good idea to have your dataset balanced, i.e. that the number of samples across all structure types is approximately the same. Elsewise, structures with fewer samples can get suppressed, especially the more epochs you train.
Please note, that the tool can only recognize structure types that are in any way visually or positionally distinguishable on a page.
Also note that P2PaLA is currently not a production-ready tool, thus please don’t expect ‘perfect’ results.

Training Parameters

See P2PaLA Train Parameters

Cookie	Description	Duration
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.	1 hour
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.	1 year

Cookie	Description	Duration
VISITOR_INFO1_LIVE	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.	5 months
IDE	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.	2 years

Cookie	Description	Duration
GPS	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location	30 minutes
tk_or	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	5 years
tk_r3d	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience	3 days
tk_lr	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	1 year
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.	2 years
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.	1 day
matomo	For statistical analysis, we use “Matomo” on this website. This is an open source tool for web analysis. Matomo does not transmit data to servers outside the control of the READ-COOP. Matomo is deactivated when you visit our website. Only if you actively consent will your usage behaviour be recorded anonymously.	1 year

Cookie	Description	Duration
YSC	This cookies is set by Youtube and is used to track the views of embedded videos.	1 year
_gat	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.	1 minute

P2PaLA

Recognition

Parameters

Training

Training Parameters

The COOP

Products & Services

Useful information

Helpful resources

Community