Probabilistic approach for correction of optically-character-recognized strings using suffix tree

R. Jain; Santanu Chaudhury

doi:10.1109/NCVPRIPG.2011.24

Profiles Research Units Publications

Conferences

Probabilistic approach for correction of optically-character-recognized strings using suffix tree

R. Jain,

Published in

2011

DOI: 10.1109/NCVPRIPG.2011.24

Pages: 74 - 77

Abstract

In this paper we present an approach for correcting character recognition errors of an OCR which can recognise Indic Scripts. Suffix tree is used to index the lexicon in lexicographical order to facilitate the probabilistic search. To obtain the best probable match against the mis-recognised string, it is compared with the sub-strings (edges of suffix tree) using similarity measure as weighted Levenshtein distance, where Confusion probabilities of characters (Unicodes) are used as substitution cost, until it exceeds the specified cost k. Retrieved candidates are sorted and selected on the basis of their lowest edit cost. Exploiting this information, the system can correct nonword errors and achieves maximum error rate reduction of 33% over simple character recognition system. © 2011 IEEE.

Topics: Suffix tree (67)%, Generalized suffix tree (66)%, Compressed suffix array (64)%, Levenshtein distance (59)% and String (computer science) (56)%

View more info for "Probabilistic Approach for Correction of Optically-Character-Recognized Strings Using Suffix Tree"

About the journal

Journal	Proceedings - 2011 3rd National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, NCVPRIPG 2011

Authors (1)

Santanu Chaudhury
- Department of Computer Science & Engineering

ACADEMICS

RESEARCH

STUDENTS

FACULTY