ICML 2011 workshop on unsupervised
and transfer learning
Stochastic Unsupervised Learning on
Unlabeled Data
Chuanren Liu*, Jianjun Xie**, Hui Xiong, and Yong Ge
*Rutgers, the
State University of New Jersey, Newark, New Jersey, USA
**CoreLogic, California, USA
In this paper,
we introduce a stochastic unsupervised learning method that was used in
the 2011 Unsupervised Learning Challenge. This method is developed to
preprocess the data which will be used in the subsequent binary
classification problems. It performs K-means clustering on principal
components instead of raw data to remove the effect of
noisy/irrelevant/less-relevant features and improve the result
robustness. It also utilizes a stochastic process to combine multiple
clustering assignments on each data point. This stochastic process is
designed to alleviate the overfitting problem. Finally, the proposed
method was shown to be effective in all the data sets and the overall
performance was ranked second in the challenge.