The integration of learning architecture with SDN-based VANETs (SDVN) is beneficial for utilizing computing power by decoupling network management services from data transfer services. However, fast safety messages dissemination in a highly dynamic vehicular environment is a challenging and complex dilemma due to bi-directional traffic and the directional movement of vehicles. It is also challenging to get an effective solution against bottleneck situations and a reliable and fault-tolerant SDN network using clustering. So considering the features of adaptive learning, in this paper, we propose adaptive self-learning clustering algorithm with reinforcement routing in SDVN known as RL-SDVN. An Expectation-Maximization model is used to predict a vehicle's movement and further Q-learning model is used to route data packets, so that vehicles in the same cluster coordinate with each other to find optimum routes. We evaluate our experimental results by comparing our approach with the clustering and self-learning based schemes proposed in the past. The outcomes exhibit that the proposed scheme improved cluster stability and life-time of a cluster member vehicle with better performance in terms of low average transmission delay, and high throughput compared to the existing routing protocols used in this research. © 2020 IEEE.