Computer Vision is the field of machine learning that deals with computers gaining knowledge from digital images/videos and performing tasks that human vision is capable of doing. It is widely used in the field of robotics for designing guidance systems where objects in the robot's field of view are identified and located. This research work is an application-specific project enabling a half-humanoid to find the 6D pose and bounding boxes of its hand and other objects within its field of view. We add an edge prediction head to the NOCS (Normalised Object Coordinate Space) model, which predicts the edges of each object from the predicted instance maps. An additional edge-agreement-loss found from the predicted edges is added to the total loss. This increases the attention to the edges and improves the accuracy of prediction of the instance masks. This edge-attention aided model is initialized with pre-trained weights of CAMERA and REAL dataset using transfer learning. The backbone layers of the model are frozen and the head layers alone are trained using a synthetic dataset (HAND dataset) we created using a software called blender. The model gives promising results when tested with objects kept in varying lighting conditions and at different distances from the camera. The use of transfer learning in models as large as the NOCS model allows us to train the model for a new class by only training the top few layers with a significantly small dataset. © 2021 IEEE.