The deep learning models, which include attention mechanisms, are shown to enhance the performance and efficiency of the various computer vision tasks such as pattern recognition, object detection, face recognition, etc. Although the visual attention mechanism is the source of inspiration for these models, recent attention models consider `attention' as a pure machine vision optimization problem, and visual attention remains the most neglected aspect. Therefore, this paper presents a collaborative human and machine attention module which considers both visual and network's attention. The proposed module is inspired by the dorsal ('where') pathways of visual processing and can be integrated with any convolutional neural network (CNN) model. First, the module computes the spatial attention map from the input feature maps, which is then combined with the visual attention maps. The visual attention maps are created using eye-fixations obtained by performing an eye-tracking experiment with human participants. The visual attention map covers the highly salient and discriminating image regions as humans tend to focus on such regions, whereas the other relevant image regions are processed by spatial attention map. The combination of these two maps results in the finer refinement in feature maps, resulting in improved performance. The comparative analysis reveals that our model not only shows significant improvement over the baseline model but also outperforms the other models. We hope that our findings using a collaborative human-machine attention module will be helpful in other computer vision tasks as well.