In this paper we present an algorithm to detect text on video frames consisting of lecture slides. We begin by performing a multi-channel wavelet transform and then merge the channel components for the high frequency sub bands to obtain a composite energy map. Thresholding the energy map results in an edge map consisting of candidate text pixels - some of these correspond to actual text and others correspond to graphics, logo, tables, etc. The connected components in the edge map are then filtered to reject some of the false positives using a trained classifier. Rectangular text blocks compactly surrounding the text regions are then identified using a process of selective dilation and recursive splitting. False positive text blocks still remaining are then rejected using heuristics. Experiments conducted on 890 images show that our scheme has lower false positive rate and misdetection rate when compared with two existing scene text detection methods. © 2012 ICPR Org Committee.