We propose a computational model for generating an interpretation of a video shot based on our proposed principle of perceptual prominence. We also provide a formulation of the perceptual grouping problem in the spatio-temporal domain to identify the perceptual clusters. We illustrate our approach with experimental results.