Feature Construction: Variations on PCA and Company

Kristin P. Bennett
Math Sciences Department
Rensselaer Polytechnic Institute
Troy, NY 12180

Feature construction methods based on spectral methods that construct low-rank approximations of the data are some of the most widely used approaches for dimensionality reduction and visualization.  This family of methods, including principal component analysis, partial least squares, canonical correlation analysis and Fisher discriminant analysis,  use various  covariance matrices formed from the data to construct low rank approximations of the data.  These methods are all based on the least squares loss function which may not be well suited for all inference tasks.  For example, more robust loss functions may be desired or sparsity of the solution may be a priority.    In this talk we show how one such method, partial least squares (PLS) can be generalized to construct low rank approximations based on arbitrary loss functions. The new methodology combined with a spectral method for feature selection is demonstrated on a high-dimensional problem from bioinformatics.   The basic approach can be generalized to other spectral methods.

This is joint work with Michinari Momma, Angela Zhang, Charles Lawrence, Curt Breneman,  N. Sukumar and Inna Vitoli.