The Future of Information Extraction – Be Part of TUC 2024! ✨ Feb 15-16, In-Person and Online. Get your Ticket >>

+ Update on table processing

Back in April we appealed for help in generating a new data set that could be used to improve the automated layout analysis of historical documents set out in tables.  We asked, and you answered!

Thanks to submissions from our network, READ researchers at the Computer Vision Lab at the Technical University of Vienna, Naver Labs Europe and the Passau Diocesan Archives have been compiling a sizeable collection of images of historical documents containing tables.

We now have a total of around 1,500 images from 25 contributors all around the world.  The delivered sources show a great variety of tables from hand-drawn accounting books to stock exchange lists and train timetables, from record books to prisoner lists, simple tabular prints in books, production census and many, many more.

READ researchers are preparing the data set as the basis for a computer science research competition in early 2019 (more details coming soon!).  This collection will be used to evaluate different approaches to the automated recognition of tables.

There is still a lot for us to learn about what constitutes a table.  Working with this heterogeneous data should help us to move beyond the specifics and come up with some generic guidelines and techniques for processing these kinds of pages.

We are very thankful to our network for delivering such a variety of tabular data and we look forward to sharing our next progress report!

Screenshot of 1937 Irish Census in Transkribus.  Image courtesy of National University of Ireland, Galway.
SHARE THIS ARTICLE

Recent Posts

January 31, 2024
News
We’re pleased to announce the latest updates to our document editor, bringing you a more intuitive and cleaner interface. Our ...
January 17, 2024
News, Transkribus
Do I need to transcribe or translate handwritten text to be able to work with it? Well, that depends on ...
January 11, 2024
News, Transkribus
The process of managing and publishing historical documents has never been easier! Creating a website that presents your transcribed material ...