Detection of objects in aerial images has gained significant attention in recent years, due to its extensive needs in civilian and military reconnaissance and surveillance applications. With the advent of Unmanned Aerial Vehicles (UAV), the scope of performing such surveillance task has increased. The small size of the objects in aerial images makes it very difficult to detect them. Two-stage Region based Convolutional Neural Network framework for object detection has been proved quite effective. The main problem with these frameworks is the low speed as compared to the one class object detectors due to the computation complexity in generating the region proposals. Region-based methods suffer from poor localization of the objects that leads to a significant number of false positives. This paper aims to provide a solution to the problem faced in real-time vehicle detection in aerial images and videos. The proposed approach used hyper maps generated by skip connected Convolutional network. The hyper feature maps are then passed through region proposal network to generate object like proposals accurately. The issue of detecting objects similar to background is addressed by modifying the loss function of the proposal network. The performance of the proposed network has been evaluated on the publicly available VEDAI dataset. © 2019, Springer Nature Switzerland AG.