Unsupervised clustering is an important tool to analyze video data. Selection of an appropriate clustering scheme is governed by the suitability of the clusters it produces. It is difficult to formulate cluster suitability criteria for a domain where different feature attributes have different meanings. We propose a novel clustering strategy, tailored towards the specific requirements of clustering in video data. Our clustering methodology decouples clustering along different feature components. Our scheme chooses the clustering model so as to meet the requirements of clustering in video data. The clusters obtained from our scheme reasonably model the homogeneous color regions in a video scene in both space and time. The space-time clusters obtained by our clustering methodology can be subsequently grouped together to compose meaningful objects. Experimental comparison of our results with existing clustering techniques clearly show that our scheme takes care of many of the problems with traditional clustering schemes applied to the heterogeneous feature space of video. © 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.