OCR for bilingual documents using language modeling

A. Ray; S. Rajeswar; Santanu Chaudhury

doi:10.1109/ICDAR.2015.7333965

Profiles Research Units Publications

Conferences

OCR for bilingual documents using language modeling

A. Ray, S. Rajeswar,

Published in IEEE Computer Society

2015

DOI: 10.1109/ICDAR.2015.7333965

Volume: 2015-November

Pages: 1256 - 1260

Abstract

Script based features are highly discriminative for text segmentation and recognition. Thus they are widely used in Optical Character Recognition(OCR) problems. But usage of script dependent features restricts the adaptation of such architectures directly for another script. With script independent systems, this problem can be solved to a certain extent for monolingual documents. But the problem aggravates in case of multilingual documents as it is very difficult for a single classifier to learn many scripts. Generally a script identification module identifies text segments and accordingly the script-dependent classifier is selected. This paper presents a unified framework of language model and multiple preprocessing hypotheses for word recognition from bilingual document images. Prior to text recognition, preprocessing steps such as binarization and segmentation are required for ease of recognition. But these steps induce huge combinatorial error propagating to final recognition accuracy. In this paper we use multiple preprocessing routines as alternate hypotheses and use a language model to verify each alternative and choose the best recognized sequence. We test this architecture for word recognition of Kannada-English and Telugu-English bilingual documents and achieved better recognition rates than single methods using same classifier. © 2015 IEEE.

Topics: Intelligent character recognition (62)%, Optical character recognition (60)%, Language model (58)%, Text segmentation (58)% and Classifier (UML) (54)%

View more info for "OCR for bilingual documents using language modeling"

About the journal

Journal	Data powered by SciSpaceProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Publisher	Data powered by SciSpaceIEEE Computer Society
ISSN	15205363

Authors (1)

Santanu Chaudhury
- Department of Computer Science & Engineering

ACADEMICS

RESEARCH

STUDENTS

FACULTY