In this paper, we address the problem of Content Based Video Retrieval using a multivariate time series modeling of features. We particularly focus on representing the dynamics of geometric features on the Spatio-Temporal Volume (STV) created from a real world video shot. The STV intrinsically holds the video content by capturing the dynamics of the appearance of the foreground object over time, and hence can be considered as a dynamical system. We have captured the geometric property of the parameterized STV using the Gaussian curvature computed at each point on its surface. The change of Gaussian curvature over time is then modeled as a Linear Dynamical System (LDS). Due to its capability to efficiently model the dynamics of a multivariate signal, Auto Regressive Moving Average (ARMA) model is used to represent the time series data. Parameters of the ARMA model are then used for video content representation. To discriminate between a pair of video shots (time series), we have used the subspace angle between a pair of feature vectors formed using ARMA model parameters. Experiments are done on four publicly available benchmark datasets, shot using a static camera. We present both qualitative and quantitative analysis of our proposed framework. Comparative results with three recent works on video retrieval also show the efficiency of our proposed framework. © 2013, Springer-Verlag London.