Haku

OCR Correction Tool for Linguistic Corpora

QR-koodi

OCR Correction Tool for Linguistic Corpora

We introduce a new tool for correcting OCR errors of materials in a repository of cultural materials. The poster is aimed to all who are interested in digital humanities and who might find our tool useful. The poster will focus on the OCR correction tool and on the background processes.

We have started a project on materials published in Finno-Ugric languages in the Soviet Union in the 1920s and 1930s. The materials are digitised in Russia. As they arrive, we publish them in DSpace (fennougrica.kansalliskirjasto.fi).

For research purposes, the results of the OCR must be corrected manually. For this we have built a new tool. Although similar tools exist, we found in-house development necessary in order to serve the researchers' needs.

The tool enables exporting the corrected text as required by the researchers. It makes it possible to distribute the correction tasks and their supervision. After a supervisor has approved a text as finalised, the new version of the work will replace the old one in DSpace.

The project has - benefitted the small language communities, - opened channels for cooperation in Russia. - increased our capabilities in digital humanities.

The OCR correction tool will be available to others.

Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Posters, Demos and Developer "How-To's"

Tallennettuna: