ICML 2011 workshop on unsupervised and transfer learning

Unsupervised dimensionality reduction via gradient-based matrix factorization with two learning rates and their automatic updates

Vladimir Nikulin and Tian-Hsiang Huang
Department of Mathematics, University of Queensland, Australia

The high dimensionality of the data, the expressions of thousands of features in a much smaller number of samples, presents challenges that affect applicability of the analytical results. In principle, it would be better to describe the data in terms of a small number of meta-features, derived as a result of matrix factorization, which could reduce noise while still capturing the essential features of the data. Three novel and mutually relevant methods are presented in this paper: 1) gradient-based matrix factorization with two learning rates (in accordance with the number of factor matrices) and their automatic updates; 2) nonparametric criterion for the selection of the number of factors; and 3) nonnegative version of the gradient-based matrix factorization which doesn’t require any extra computational costs in difference to the existing methods. We demonstrate an effectiveness of the proposed methods to the supervised classification of gene expression data.