ICML 2011 workshop on unsupervised and transfer learning

Stochastic Unsupervised Learning on Unlabeled Data
Chuanren Liu*, Jianjun Xie**, Hui Xiong, and Yong Ge
*Rutgers, the State University of New Jersey, Newark, New Jersey, USA
**CoreLogic, California, USA

In this paper, we introduce a stochastic unsupervised learning method that was used in the 2011 Unsupervised Learning Challenge. This method is developed to preprocess the data which will be used in the subsequent binary classification problems. It performs K-means clustering on principal components instead of raw data to remove the effect of noisy/irrelevant/less-relevant features and improve the result robustness. It also utilizes a stochastic process to combine multiple clustering assignments on each data point. This stochastic process is designed to alleviate the overfitting problem. Finally, the proposed method was shown to be effective in all the data sets and the overall performance was ranked second in the challenge.