We present an approach for automatically identifying the script of the text localized in the scene images. Our approach is inspired by the advancements in mid-level features. We represent the text images using mid-level features which are pooled from densely computed local features. Once text images are represented using the proposed mid-level feature representation, we use an off-the-shelf classifier to identify the script of the text image. Our approach is efficient and requires very less labeled data. We evaluate the performance of our method on a recently introduced CVSI dataset, demonstrating that the proposed approach can correctly identify script of 96.70% of the text images. In addition, we also introduce and benchmark a more challenging Indian Language Scene Text (ILST) dataset for evaluating the performance of our method. © 2016 IEEE.