Can AI save bad scans?

The starting point for any kind of document digitization, whether done by hand or through sophisticated text recognition algorithms, is a good-quality image. Take a look at the one below. It is a scan of the US declaration of independence – but not of the original. The real one has suffered badly due to improper storage and remains pretty washed out as of today. The one below is a facsimile created by William Stone in 1823, and it has become the most commonly used copy of the declaration. It is actually a mystery how Stone managed to create such a precise clone of the original parchment, but thanks to him we still have an easily readable version of this historic document.

Source: Wikipedia

Below is a small, low resolution section of the main text. A human could still identify most letters thanks to context, but it would be a tedious task for an unfamiliar text and we can imagine that HTR algorithms won’t be too happy, either, with this kind of input once the resolution gets too low. This raises a few questions: What if the original paper is lost or degraded and all that remains is a bad-quality digital scan? Or what if one has already scanned ten thousand pages, only to find that text on some of them is so small that the resolution is no longer sufficient? Do we have to scan everything again and increase our already stretched storage budgets? Maybe not.

There are several classical techniques to improve such a pixelated mess. The basic task is always to add more pixels in between existing pixels, but the question is how to choose these new pixels. The nearest-neighbour method simply takes the closest original pixel and copies it. Bilinear interpolation computes the change between neighbouring pixels and then selects an appropriate intermediate value given the position of the new pixel. Bicubic interpolation takes this up another notch by using a nonlinear function to guess an appropriate value. Alas, all of these methods suffer from a fundamental shortcoming: They can not add new information to an image. Where a human might be able to imagine a sharp line or a closed loop thanks to the surrounding context, these classical techniques only follow comparatively simple rules. This is where artificial neural networks can come in handy.

Compare the interpolation techniques yourself:

Low-Res Input

Nearest

Bilinear

Bicubic

Low-Res Input

Nearest

Bilinear

Bicubic

Last year, NVIDIA released an updated version of their deep learning supersampling algorithm, or DLSS for short. It turns out that deep learning models are now so good at improving images, that they can be used to improve the performance of real time applications. Rendering frames at lower resolution and then running them through a neural network turns out to be faster than rendering them at high resolutions in the first place, while it results in almost no perceivable image quality reduction.

Image by NVIDIA

Unfortunately, the process of upscaling real time computer graphics has certain advantages. For example, one usually has several images in a sequence that can be used to extract additional information that may be lost in individual images. One can also use additional information provided by the rendering engine, like motion vectors or even object stencils. When dealing with scanned pages of old documents, we have none of these things. We only have one image, and we have to “imagine” any kind of extra information. Fortunately, this is an area where AI has excelled as well. This particular sub field has made use of so called Generative Adversarial Networks, and while they are not yet really used in production level environments, they do show remarkable potential. They work by employing two separate neural networks: A generator and a discriminator. In the most common use case, the generator creates new images, while the discriminator tries to spot fake images among real ones from a given training dataset. The training process is a zero sum game where one network gets better at faking images while the other one gets better at identifying fakes. When trained long enough, GANs have been shown to produce photorealistic results. If we would like to create completely new images, we would essentially feed random data to the generator as inputs. This is very interesting for artists or content creators, but we actually want to improve existing images. To do that, we need a slightly modified setup, for which we took a closer look at the architecture described in this paper: Photo-Realistic Single Image Super-Resolution Using a Generative AdversarialNetwork. The details are a bit too involved for this post, but the results speak for themselves.

Low-Res Input

Bicubic

AI upscale

High-Res Original

Low-Res Input

Bicubic

AI upscale

High-Res Original

One particularly interesting feature of this model is that it was never trained on handwritten text. It was trained on the DIV2k dataset, which contains a wide variation of high resolution color images showing all sorts of objects and sceneries – but no textual images.

We expect that in the future, with more specific training, this technology could not just improve readability for humans, but also for HTR models and perhaps even reduce storage or bandwidth requirements. Stay tuned for future updates and other insights into our technology development on readcoop.eu/insights.

Cookie	Description	Duration
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.	1 hour
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.	1 year

Cookie	Description	Duration
VISITOR_INFO1_LIVE	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.	5 months
IDE	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.	2 years

Cookie	Description	Duration
GPS	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location	30 minutes
tk_or	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	5 years
tk_r3d	The cookie is installed by JetPack. Used for the internal metrics fo user activities to improve user experience	3 days
tk_lr	This cookie is set by JetPack plugin on sites using WooCommerce. This is a referral cookie used for analyzing referrer behavior for Jetpack	1 year
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.	2 years
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.	1 day
matomo	For statistical analysis, we use “Matomo” on this website. This is an open source tool for web analysis. Matomo does not transmit data to servers outside the control of the READ-COOP. Matomo is deactivated when you visit our website. Only if you actively consent will your usage behaviour be recorded anonymously.	1 year

Cookie	Description	Duration
YSC	This cookies is set by Youtube and is used to track the views of embedded videos.	1 year
_gat	This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.	1 minute

Can AI save bad scans?

Get started with Transkribus

Make your historical documents accessible

The COOP

Products & Services

Useful information

Helpful resources

Community