Header menu link for other important links
X
High performance layout analysis of medieval european document images
S.S. Bukhari, , A. Dengel,
Published in SciTePress
2018
Volume: 2018-January
   
Pages: 324 - 331
Abstract
Layout analysis, mainly including binarization and page segmentation, is one of the most important performance determining steps of an OCR system for complex medieval document images, which contain noise, distortions and irregular layouts. In this paper, we present high performance page segmentation techniques for medieval European document images which include a novel main-body and side-notes segregation and an improved version of OCRopus (OCRopus,) based text line extraction. In order to complete the high performance layout analysis pipeline, we have also presented the application of the percentile based binarization (Afzal et al., 2014) and the multiresolution morphology based text and non-text segmentation (Bukhari et al., 2011) methods over historical document images. presented layout analysis techniques are applied to a collection of the 15th century Latin document images, which achieved more than 90% accuracy for each of the segmentation techniques. Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
About the journal
JournalICPRAM 2018 - Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods
PublisherSciTePress