We address the problem of image retrieval using textual queries. In particular, we focus on descriptive queries that can be either in the form of simple captions (e.g., \a brown cat sleeping on a sofa"), or even long descriptions with mul-tiple sentences. We present a probabilistic approach that seamlessly integrates visual and textual information for the task. It relies on linguistically and syntactically motivated mid-level textual patterns (or phrases) that are automati-cally extracted from available descriptions. At the time of retrieval, the given query is decomposed into such phrases, and images are ranked based on their joint relevance with these phrases. Experiments on two popular datasets (UIUC Pascal Sentence and IAPR-TC12 benchmark) demonstrate that our approach effectively retrieves semantically mean-ingful images, and outperforms baseline methods. © 2015 ACM.