Trainable script identification strategies for Indian languages

Santanu Chaudhury; R. Sheth

doi:10.1109/ICDAR.1999.791873

Profiles Research Units Publications

Conferences

Trainable script identification strategies for Indian languages

, R. Sheth

Published in IEEE Computer Society

1999

DOI: 10.1109/ICDAR.1999.791873

Pages: 661 - 664

Abstract

Identification of the script in an image of a document page is of primary importance for a system processing multi-lingual documents. In this paper three trainable classification schemes have been proposed for identification of Indian scripts. The first scheme is based upon a frequency domain representation of the horizontal profile of the textual blocks. The other two schemes use connected components extracted from the textual region. We have proposed a novel Gabor filter-based feature extraction scheme for the connected components. We have also found that frequency distribution of the width-to-height ratio of the connected components can also be used for script recognition. It has been experimentally found that the Gabor filter-based scheme provides the most reliable performance. However, the other two techniques are computationally more efficient. © 1999 IEEE.

Topics: Gabor filter (61)%, Feature extraction (53)%, Optical character recognition (51)%, Contextual image classification (50)% and Identification (information) (50)%

View more info for "Trainable script identification strategies for Indian languages"

About the journal

Journal	Data powered by SciSpaceProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Publisher	Data powered by SciSpaceIEEE Computer Society
ISSN	15205363

Authors (1)

Santanu Chaudhury
- Department of Computer Science & Engineering

ACADEMICS

RESEARCH

STUDENTS

FACULTY