We focus on synthesis of textual description from a given building floor plan image based on the first-person vision perspective. Tasks like symbol spotting, wall and decor segmentation, semantic and perceptual segmentation has been done in the past on floor plans. Here, for the first time, we propose an end-to-end framework for first person vision based textual description synthesis of building floor plans. We demonstrate (qualitative and quantitatively) that the proposed framework gives state of the art performance on challenging, real-world floor plan images. Potential application of this work could be understanding floor plans, stability analysis of buildings, and retrieval. © 2018 IEEE.