Feature Extraction, Foundations and Applications
This book is organized around the results of a benchmark that took place in 2003 (the website of the challenge is still active) and whose results were discussed at the NIPS 2003 workshop on feature extraction. Dozens of research groups competed on five large feature selection problems from various application domains: medical diagnosis, text processing, drug discovery, and handwriting recognition. The results of this effort pave the way to a new generation of methods capable of analyzing data tables with million of lines and/or columns. This book is a step towards validating, unifying, and formalizing approaches. The data and sample code with useful baseline methods implemented in Matlab are available as a single big archive (93 MB). Course material is also available.
FREE BOOK INTRODUCTION
(courtesy of the publisher).
"This book compiles some very promising techniques in feature selection and supervised learning, coming from an extremely smart collection of researchers, delivering their best ideas in a competitive environment."
-- Trevor Hastie, Stanford University
"Feature Selection is a key technology for making sense of the high dimensional data which surrounds us. Isabelle Guyon et al. have done a splendid job in designing a challenging competition, and collecting the lessons learned."
-- Bernhard Schoelkopf, Max Planck Institute for Biological Cybernetics
Feature Extraction, one of the pillars of machine learning, finds application in biotechnology, industrial inspection, the Internet,
radar, sonar, and speech recognition, just to name a few. This new book, edited by Isabelle Guyon, Steve Gunn, Masoud Nikravek and Lotfi Zadeh, is unique:
- The book owes it origin to a competition, followed by a Neural Information Processing Systems (NIPS) Workshop that was
held in December 2003.
- Most, important, the book embodies many of the-state-of-the-art methods in feature extraction.
Simply put, the book will make a difference to the literature on machine learning.
-- Simon Haykin, Mc Master University
Foreword by Peter Norvig
Everyone loves a good competition. As I write this, two billion fans are
eagerly anticipating the 2006 World Cup. Meanwhile, a fan base that is
somewhat smaller (but presumably includes you, dear reader) is equally eager
to read all about the results of the NIPS 2003 Feature Selection Challenge,
contained herein. Fans of Radford Neal and Jianguo Zhang (or of Bayesian
neural networks and Dirichlet diffusion trees) are gloating ``I told you
so'' and looking for proof that their win was not a fluke. But the matter
is by no means settled, and fans of SVMs are shouting "wait 'til next year!"
You know this book is a bit more edgy than your standard academic treatise
as soon as you see the dedication: ``To our friends and foes.''
Competition breeds improvement. 50 years ago, the champion in 100m butterfly swimming was 22% slower than today's champion; the women's marathon champion from just 30 years ago was 26% slower. Who knows how much better our machine learning algorithms would be today if Turing in 1950 had proposed an effective competition rather than his elusive Test?
But what makes an effective competition? The field of Speech Recognition has had NIST-run competitions since 1988; error rates have been reduced by a factor of three or more, but the field has not yet had the impact expected of it. Information Retrieval has had its TREC competition since 1992; progress has been steady and refugees from the competition have played important roles in the hundred-billion-dollar search industry. Robotics has had the DARPA Grand Challenge for only two years, but in that time we have seen the results go from complete failure to resounding success (although it may have helped that the
second year's course was somewhat easier than the first's).
I think there are four criteria that define effective technical competitions:
The Feature Selection Challenge meets the first three criteria easily.
75 teams entered, so they must have found it approachable. The scoring
did a good job of separating the top performers while keeping everyone on
the scale. And the results are all available online, in this book,
and in the accompanying CD. All the data and Matlab code is provided,
so the Challenge is easily reproducible. The level of explication provided
by the entrants in the chapters of this book is higher than in other similar
competitions. The fourth criterion, real-world relevance, is perhaps
the hardest to achieve. Only time will tell whether the Feature Selection
Challenge meets this one. In the mean time, this book sets a high standard
as the public record of an interesting and effective competition.
Palo Alto, CA
Table of Contents
to Feature Extraction
Isabelle Guyon, André Elisseeff
Feature extraction includes feature construction, space dimensionality reduction, sparse representations, and feature selection. All these techniques are commonly used as preprocessing to machine learning and statistics tasks of prediction, including pattern recognition and regression. Although such problems have been tackled by researchers for many years, there has been recently a renewed interest in feature extraction. A number of new applications with very large input spaces critically need space dimensionality reduction for efficiency and efficacy of the predictors. These applications include bioinformatics (DNA microarrays, mass-spectrometric data, etc.), combinatorial chemistry (e.g. high throughput screening of drug candidates), text processing (e.g. spam filtering), decision making (e.g. oil drilling), pattern recognition (e.g. handwriting recognition), speech processing, and vision.
An earlier tutorial by the same authors published in JMLR.
Part I: Feature Extraction Fundamentals
The first part of the book consists of tutorial chapters that synthesize recent approaches and new theoretical advances.
1) Learning machines
Krzysztof Grabczewski & Norbert Jankowski
Brief tutorial on some of the most widely used learning machines (Linear discriminant, naïve Bayes, neural networks, fuzzy logic, kernel methods -- including SVMs --, decision trees, clustering.) The goal of this chapter is to allow the reader to read the other chapters without referring to other textbooks.
2) Assessment methods
Fundamentals in statistical testing and cross-validation. The goal of this chapter is to provide the reader the necessary tools to assess the goodness of selected features in a statistically significant way.
3) Filter methods
Correlation methods and other methods for ranking feature in order of relevance. The goal of this chapter is to provide the reader with easy to implement methods, which are also important baseline methods.
4) Search strategies
This chapter covers techniques that can be combined with any off-the-shelf learning machine package in a "wrapper" or "filter" feature selection setting, including exhaustive search (and computational and statistical limitations), classical greedy search (forward, backward), best first, branch and bound, beam search, GA, simulated annealing.
5) Embedded methods
Navin Lal, Olivier Chapelle, Jason Weston, Andre Elisseeff
This chapter covers algorithms that select features in the process of learning, including gradient based feature scaling; nested subset methods; use of performance bounds; direct objective optimization; decision trees.
6) Information theoretic methods
Methods derived from information theoretic principles both for feature selection and construction, including information bottleneck, and Markov blankets.
7) Ensemble learning
Methods involving voting over several feature subsets. Bayesian approaches to feature selection, bagging/stability methods.
8) Fuzzy Neural Networks
Madan M. Gupta, Noriyasu Homma, Zeng-Guang Hou
A new generation of learning machines combining Fuzzy Logic and neural networks.
Part II: Feature Selection Challenge
The second part of the book includes the results of the NIPS 2003 feature selection challenge. See the slides of the NIPS 2003 workshop.
9) Design and Analysis of the NIPS2003 Challenge
Isabelle Guyon, Steve Gunn, Asa Ben Hur, Gideon Dror
The challenge attracted 75 competitors. Researchers all over the world competed for 12 weeks on 5 sizeable datasets from a variety of applications, with number of input features ranking from 500 to 100,000. The goal was to use as few input features as possible without sacrificing prediction performance. The competitors succeeded in reducing the input space dimensionality by orders of magnitude while retaining or improving performance. Some of the most efficient methods are among the simplest and fastest. The benefits of using such techniques include performance improvements in terms of prediction speed and accuracy as well as gaining better data understanding by identifying the inputs that are most predictive of a given outcome. A short version was published in the NIPS proceedings.
10) High Dimensional Classification with Bayesian Neural Networks and Dirichlet Diffusion Trees
Radford M. Neal, Jianguo Zhang
11) Ensembles of Regularized Least Squares Classifiers for High-Dimensional Problems
Kari Torkkola and Eugene Tuv
12) Combining SVMs with Various Feature Selection Strategies
Yi-Wei Chen, Chih-Jen Lin
13) Feature Selection with Transductive Support Vector Machines
Zhili Wu, Chunhung Li
14) Variable Selection using Correlation and Single Variable Classifier Methods: Applications
Amir Reza Saffari Azar Alamdari
15) Tree-Based Ensembles with Dynamic Soft Feature Selection
Alexander Borisov, Victor Eruhimov, Eugene Tuv
16) Sparse, Flexible and Efficient Modeling using L1 Regularization
Saharon Rosset, Ji Zhu
17) Margin Based Feature Selection and Infogain with Standard Classifiers
Ran Gilad-Bachrach, Amir Navot
18) Bayesian Support Vector Machines for Feature Ranking and Selection
Wei Chu, S. Sathiya Keerthi, Chong Jin Ong, Zoubin Ghahramani
19) Nonlinear Feature Selection with the Potential Support Vector Machine
Sepp Hochreiter, Klaus Obermayer
20) Combining a Filter Method with SVMs
Thomas Navin Lal, Olivier Chapelle, Bernhard Schölkopf
21) Feature Selection via Sensitivity Analysis with Direct Kernel PLS
Mark J. Embrechts, Robert A. Bress, Robert H. Kewley
22) Information Gain, Correlation and Support Vector Machines
Danny Roobaert, Grigoris Karakoulas, Nitesh V. Chawla
23) Mining for Complex Models Comprising Feature Selection and Classification
Krzysztof Grabczewski, Norbert Jankowski
24) Combining Information-Based Supervised and Unsupervised Feature Selection
Sang-Kyun Lee, Seung-Joon Yi, Byoung-Tak Zhang
25) An Enhanced Selective Naïve Bayes Method with Optimal Discretization
26) An Input Variable Importance Definition based on Empirical Data Probability Distribution
V. Lemaire, F. Clérot
Part III New Perspectives in Feature Extraction
The last part of the book is devoted to recent advances in feature extraction.
27) Spectral Dimensionality Reduction
Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux, Jean-Francois Paiement, Pascal Vincent, Marie Ouimet
28) Constructing Orthogonal Latent Features for Arbitrary Loss
Michinari Momma, Kristin P. Bennett
29) Large Margin Principles for Feature Selection
Ran Gilad-Bachrach, Amir Navot, Naftali Tishby
30) Feature Extraction for Classification of Proteomic Mass Spectra: A Comparative Study
Ilya Levner, Vadim Bulitko, Guohui Lin
31) Sequence motifs: highly predictive features of protein function
Asa Ben-Hur, Douglas Brutlag
A. Elementary Statistics
B. Feature Selection Challenge Datasets
C. Feature Selection Challenge Fact Sheets
D. Feature Selection Challenge Results Tables