DocDescribor: Digits + Alphabets + Math Symbols - A Complete OCR for Handwritten Documents

R. Aggarwal; H. Jain; Gaurav Harit; Anil Kumar Tiwari

doi:10.1007/978-981-15-8697-2_27

Profiles Research Units Publications

Conferences

DocDescribor: Digits + Alphabets + Math Symbols - A Complete OCR for Handwritten Documents

R. Aggarwal, H. Jain, ,

Published in Springer Science and Business Media Deutschland GmbH

2020

DOI: 10.1007/978-981-15-8697-2_27

Volume: 1249

Pages: 292 - 301

Abstract

This paper presents an Optical Character Recognition (OCR) system for documents with English text and mathematical expressions. Neural network architectures using CNN layers and/or dense layers achieve high level accuracy in character recognition task. However, these models require large amount of data to train the network, with balanced number of samples for each class. Recognition of mathematical symbols poses challenges of the imbalance and paucity of training data available. To address this issue, we pose the character recognition problem as a Distance Metric Learning problem. We propose a Siamese-CNN Network that learns discriminative features to identify if the two images in a pair contain similar or dissimilar characters. The network is then used to recognize different characters by character matching where test images are compared to sample images of any target class which may or may not be included during training. Thus our model can scale to new symbols easily. The proposed approach is invariant to author’s handwriting. Our model has been tested over images extracted from a dataset of scanned answer scripts collected by us. It is seen that our approach achieves comparable performance to other architectures using convolutional layers or dense layers while using lesser training data. © 2020, Springer Nature Singapore Pte Ltd.

Topics: Optical character recognition (60)% and Character (computing) (52)%

View more info for "DocDescribor: Digits + Alphabets + Math Symbols - A Complete OCR for Handwritten Documents"

About the journal

Journal	Data powered by SciSpaceCommunications in Computer and Information Science
Publisher	Data powered by SciSpaceSpringer Science and Business Media Deutschland GmbH
ISSN	18650929
Open Access	No

Authors (2)

Gaurav Harit
- Department of Computer Science & Engineering
Anil Kumar Tiwari
- Department of Electrical Engineering

ACADEMICS

RESEARCH

STUDENTS

FACULTY