Head direction (HD) cells, grid cells, and place cells, often dubbed spatial cells, are neural correlates of spatial navigation. We propose a computational model to study the influence of multisensory modalities, especially vision, and proprioception on responses of these cells. A virtual animal was made to navigate within a square box along a synthetic trajectory. Visual information was obtained via a cue card placed at a specific location in the environment, while proprioceptive information was derived from curvature-modulated limb oscillations associated with the gait of the virtual animal. A self-organizing layer was used to encode HD information from both sensory streams. The sensory integration (SI) of HD from both modalities was performed using a continuous attractor network with local connectivity, followed by oscillatory path integration and lateral anti-Hebbian network, where spatial cell responses were observed. The model captured experimental findings which investigated the role of visual manipulation (cue card removal and cue card rotation) on these spatial cells. The model showed a more stable formation of spatial representations via the visual pathway compared to the proprioceptive pathway, emphasizing the role of visual input as an anchor for HD, grid, and place responses. The model suggests the need for SI at the HD level for formation of such stable representations of space essential for effective navigation. © 2016 IEEE.