A demonstration · Datalab Surya OCR 2 (650M)

The Recovered Page reading 1900s–1930s European newspapers with a small open model

A 650-million-parameter model transcribing historic newspaper scans from Europeana — across seven languages and two scripts. Each page below shows the original scan beside the model’s reading-order transcription. Toggle to the original Europeana OCR to compare against decades-old text, or reveal the layout blocks the model recovered.

1 / 107

show layout blocks

header / title text table figure reading order