Feature/Model Selection by the Linear Programming SVM
Erinija Pranckeviciene, Ray Somorjai, National Research Council Canada, and Muoi N. Tran ,CancerCare Manitoba, Canada
erinija.pranckevie@nrc-nrcc.gc.ca

Many real-world classification problems are represented by very sparse and high-dimensional data. The recent successes of a linear programming support vector machine (LPSVM) for feature selection motivated a deeper analysis of the method when applied to sparse, multivariate data. Due to the sparseness, the selection of a classification model is greatly influenced by the characteristics of that particular dataset. In this study, we investigate a feature selection strategy based on LPSVM as the initial feature filter, combined with state-of-art classification rules, and apply to five real-life datasets of the \textbf{Agnostic learning v.. prior knowledge} challenge of IJCNN2007. Our goal is to better understand the robustness of LPSVM as a feature filter. Our analysis suggests that LPSVM can be a useful black box method for identification of the profile of the informative features in the data. If the data are complex and better separable by nonlinear methods, then feature pre-filtering by LPSVM enhances the data representation for other classifiers.