We present a novel learning-based framework for detecting interesting events in soccer videos. The input to the system is a raw soccer video. We have learning at three levels - learning to detect interesting low-level features from image and video data using Support Vector Machines (hereafter, SVMs), and a hierarchical Conditional Random Field-(hereafter, CRF-) based methodology to learn the dependencies of mid-level features and their relation with the lowlevel features, and high level decisions ('interesting events') and their relation with the mid-level features: all on the basis of training video data. Descriptors are spatio-temporal in nature - they can be associated with a region in an image or a set of frames. Temporal patterns of descriptors characterise an event. We apply this framework to parse soccer videos into Interesting (a goal or a goal miss) and Non-Interesting videos. We present results of numerous experiments in support of the proposed strategy. © 2008 IEEE.