Header menu link for other important links
Improving Classical OCRs for Brahmic Scripts Using Script Grammar Learning
D. Ganguly, S. Agarwal,
Published in IEEE Computer Society
Volume: 7
Pages: 37 - 41
Classical OCRs based on isolated character (symbol) recognition have been the fundamental way of generating textual representations, particularly for Indian scripts, until the time transcription-based approaches gained momentum. Though the former approaches have been criticized as prone to failures, their accuracy has nevertheless been fairly decent in comparison with the newer transcription-based approaches. Analysis of isolated character recognition OCRs for Hindi and Bangla revealed most errors were generated in converting the output of the classifier to valid Unicode sequences, i.e., script grammar generation. Linguistic rules to generate scripts are inadequately integrated, thus resulting in a rigid Unicode generation scheme which is cumbersome to understand and error prone in adapting to new Indian scripts. In this paper we propose a machine learning-based classifier symbols to Unicode generation scheme which outperforms the existing generation scheme and improves accuracy for Devanagari and Bangla scripts. © 2017 IEEE.
About the journal
JournalData powered by TypesetProceedings of the International Conference on Document Analysis and Recognition, ICDAR
PublisherData powered by TypesetIEEE Computer Society