In this paper, we propose to synthesize natural images from a set of input objects. The proposed technique generates a scene which has high correlation with the provided set of input objects while also maintaining the natural placement of objects within the scene. The technique constitutes of a generative adversarial network trained on a large corpus of objects and natural scenes. This is in contrast with earlier works where the objective was to generate a natural scene from a noise vector or conditioning the network over a variable. However, such methods have limitations in their ability to control the objects within the generated images. On the contrary, we show that by training a Generative Adversarial Network with raw image pixels as input, we can generate scenes which constitute the objects as well as generate the surrounding environment suitable for the combination of the input objects. We provide qualitative and quantitative results on challenging MS-COCO dataset to show the effectiveness of the proposed technique. © 2019 IEEE.