This paper presents a new model based document image segmentation scheme that uses XML-DTDs (extensible Mark-up Language-Document Type Definition). Given a document image, the algorithm has the ability to select the appropriate model. A new wavelet based tool has been designed for distinguishing text from non-text regions and characterization of font sizes. Our model based analysis scheme makes use of this tool for identifying the logical components of a document image. © 2001 IEEE.