In this study, the authors propose a framework SUGAMAN (Supervised and Unified framework using Grammar and Annotation Model for Access and Navigation). SUGAMAN is a Hindi word meaning ‘easy passage from one place to another’. SUGAMAN synthesises textual description from a given floor plan image, usable by visually impaired to navigate by understanding the arrangement of rooms and furniture. It is the first framework for describing a floor plan and giving direction for obstacle-free movement within a building. The model learns five classes of room categories from 1355 room image samples under a supervised learning paradigm. These learned annotations are fed into a description synthesis framework to yield a holistic description of a floor plan image. Authors demonstrate the performance of various supervised classifiers on room learning and provided a comparative analysis of system generated and human-written descriptions. The contribution of this study includes a novel framework for description generation from document images with graphics while proposing a new feature representing the floor plans, text annotations for a publicly available data set, and an algorithm for door to door obstacle avoidance navigation. This work can be applied to areas like understanding floor plans and design of historical monuments, and retrieval. © The Institution of Engineering and Technology 2019