Header menu link for other important links
X
Scalable clustering and applications
K.I. Shahid,
Published in Association for Computing Machinery
2016
Abstract
Large scale machine learning is becoming an active research area recently. Most of the existing clustering algorithms cannot handle big data due to its high time and space complexity. Among the clustering algorithms, eigen vector based clustering, such as Spectral clustering, shows very good accuracy, but it has cubic time complexity. There are various methods proposed to reduce the time and space complexity for eigen decomposition such as Nyström method, Lanczos method etc. Nyström method has linear time complexity in terms of number of data points, but has cubic time complexity in terms of number of sampling points. To reduce this, various Rank k approximation methods also proposed, but which are less efficient compare to the normalized spectral clustering. In this paper we propose a two step algorithm for spectral clustering to reduce the time complexity to O(nmk + m2k′), by combining both Nyström and Lanczos method, where k is the number of clusters and k′ is the rank k approximation of the sampling matrix (k < k′ < < m < < n). It shows very good results, with various data sets, image segmentation problems and churn prediction of a telecommunication data set, even with very low sampling (for 10 Million x 10 Million matrix, sampled only 100 columns) with lesser time, which confirms the validity of the algorithm. © 2016 ACM.
About the journal
JournalData powered by TypesetACM International Conference Proceeding Series
PublisherData powered by TypesetAssociation for Computing Machinery