Header menu link for other important links
X
DocDescribor: Digits + Alphabets + Math Symbols - A Complete OCR for Handwritten Documents
R. Aggarwal, H. Jain, ,
Published in Springer Science and Business Media Deutschland GmbH
2020
Volume: 1249
   
Pages: 292 - 301
Abstract
This paper presents an Optical Character Recognition (OCR) system for documents with English text and mathematical expressions. Neural network architectures using CNN layers and/or dense layers achieve high level accuracy in character recognition task. However, these models require large amount of data to train the network, with balanced number of samples for each class. Recognition of mathematical symbols poses challenges of the imbalance and paucity of training data available. To address this issue, we pose the character recognition problem as a Distance Metric Learning problem. We propose a Siamese-CNN Network that learns discriminative features to identify if the two images in a pair contain similar or dissimilar characters. The network is then used to recognize different characters by character matching where test images are compared to sample images of any target class which may or may not be included during training. Thus our model can scale to new symbols easily. The proposed approach is invariant to author’s handwriting. Our model has been tested over images extracted from a dataset of scanned answer scripts collected by us. It is seen that our approach achieves comparable performance to other architectures using convolutional layers or dense layers while using lesser training data. © 2020, Springer Nature Singapore Pte Ltd.
About the journal
JournalData powered by TypesetCommunications in Computer and Information Science
PublisherData powered by TypesetSpringer Science and Business Media Deutschland GmbH
ISSN18650929
Open AccessNo