We propose a novel framework for object detection and localization in images containing appreciable clutter and occlusions. The problem is cast in a statistical hypothesis testing framework. The image under test is converted into a set of local features using affine invariant local region detectors, described using the popular SIFT descriptor. Due to clutter and occlusions, this set is expected to contain features which do not belong to the object. We sample subsets of local features from this set and test for the alternate hypothesis of object present against the null hypothesis of object absent. Further, we use a method similar to the recently proposed spatial scan statistic to refine the object localization estimates obtained from the sampling process. We demonstrate the results of our method on the two datasets TUD Motorbikes and TUD Cars. TUD Cars database has background clutter. TUD Motorbikes dataset is recognized to have substantial variation in terms of scale, background, illumination, viewpoint and occlusions. © 2008 IEEE.