We study the problem of retrieving cartoon faces of celebrities given their real face as a query. We refer to this problem as Photo2Cartoon. The Photo2Cartoon problem is challenging since (i) cartoons vary excessively in style and (ii) modality gap between real and cartoon faces is large. To address these challenges, we present a discriminative deep metric learning approach designed for matching cross-modal faces and showcase Photo2Cartoon. The proposed approach learns a nonlinear transformation to project real and cartoon face pairs into a common subspace where distance between positive pairs becomes smaller as compared to distance between negative pairs. We evaluate our method on two public benchmarks, namely IIIT-CFW and Viewed Sketch, and show superior retrieval results as compared to related methods. © 2018, Springer-Verlag London Ltd., part of Springer Nature.