In this work, we address the problem of extracting high dimensional, soft semantic feature descriptors for every pixel in an image using a deep learning framework. Existing methods rely on a metric learning objective called multi-class N-pair loss, which requires pairwise comparison of positive examples (same class pixels) to all negative examples (different class pixels). Computing this loss for all possible pixel pairs in an image leads to a high computational bottleneck. We show that this huge computational overhead can be reduced by learning this metric based on superpixels. This also conserves the global semantic context of the image, which is lost in pixel-wise computation because of the sampling to reduce comparisons. We design an end-to-end trainable network with a loss function and give a detailed comparison of two feature extraction methods: pixel-based and superpixel-based. We also investigate hard semantic labeling of these soft semantic feature descriptors. © Springer Nature Singapore Pte Ltd 2020.