and feature selection challenge

December 11-13, 2003
Whistler, British Columbia, CA
 
 
 

*** Challenge result analysis ***


 
 

Background

Recently, there has been much research effort put into the field of feature extraction. In the past few years, the number of papers related to feature extraction, including feature construction, space dimensionality reduction, sparse representations, and feature selection, has been approaching almost ten percent of the NIPS submissions. The applications studied cover a wide range of domains, including bioinformatics, chemistry, text processing, pattern recognition, speech processing, and vision. Yet, there does not seem to be an emerging unity, be it from the standpoint of experimental design, algorithms, or theoretical analysis. The purpose of the workshop is to bring together researchers of various application domains to share techniques and methods.

Part of the workshop will be devoted to presentations and discussions of the result of a challenge on feature selection. Results published in the field of feature selection have been in the past, for the most part, on different data sets or have used different data splits. This makes them hard to compare. We formatted a number of datasets for the purpose of benchmarking feature selection algorithms in a controlled manner . The data sets were chosen to span a wide variety of domains. We chose data sets that had sufficiently many examples to create a large enough test set to obtain statistically significant results. The input variables are continuous or binary, sparse or dense. All problems are two-class classification problems. The similarity of the tasks will allow participants to enter results on all data sets to test the genericity of the algorithms. 

Challenge 
The NIPS 2003 challenge in feature selection is to find feature selection algorithms that significantly outperform methods using all features, using as benchmark ALL five datasets formatted for that purpose. To facilitate entering results for all five datasets, all tasks are two-class classification problems. During the development period, participants may submit validation set results on a subset of the datasets. 

How to participate:
Simply download the five datasets from the challenge web site.

 
File Size (MB) Type Num. ex. (tr/val/te) Num. feat.
ARCENE.zip 8.7 Non sparse 100/100/700 10000
DEXTER.zip 0.9 Sparse integer 300/300/2000 20000
DOROTHEA.zip 4.7 Sparse binary 800/350/800 100000
GISETTE.zip 22.5 Non sparse 6000/1000/6500 5000
MADELON.zip 2.9 Non sparse 2000/600/1800 500

If you are a Matlab user, we provide sample code to read and check the data. Otherwise, the data follow a straightforward ASCII format. Check the latest challenge results.

Each dataset is split into training, validation, and test set. Only the training labels are provided. During the development period, participants can return classification results on the validation set, even for a subset of the datasets. They will receive in return their validation set scores. At any time (but presumably after some development period) the participants can submit their final classification results on ALL the datasets (with a limit of five sumissions per person). 

Closing deadline:
CLOSED. The submission deadline was: December 1st, 2003.

Questions: Check our challenge FAQ.

Submission for a workshop presentation
The workshop is open to contributions related to feature extraction at large, including theoretical and practical contributions on feature construction, space dimensionality reduction, and feature selection. Participating in the challenge is not a pre-requisite to submitting an abstract, but some priority will be given to challenge participants having competitive methods. Abstracts less than one page long should be sent to nips2003@clopinet.com

CLOSED: The deadline to submit abstracts was:  December 1, 2003.
The workshop was a success, we had 17 presentations and  98 participants.
Springer will publish the best papers as part of an edited book on feature extraction.

Schedule

 Friday Dec. 12, morning session 7:30am-10:30am
Feature Selection
Chair: Steve Gunn

7:30am Benchmark datasets and challenge result summary 
[slides] [dataset description] [December 1st results] [December 8th results]
Isabelle Guyon, Steve Gunn, Asa Ben Hur, and Gideon Dror

7:50am Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees
[abstract] [slides]
Radford M. Neal and Jianguo Zhang

8:20am Random Forests and Regularized Least Squares Classifiers
[abstract] [slides]
Kari Torkkola and Eugene Tuv

8:40am Feature Selection using SVM and Random Forest
[abstract] [slides]
Yi-Wei Chen and Chih-Jen Lin

9:00am Break

9:10am Feature Selection using Transductive Support Vector Machine
[abstract] [slides]
Zhi-li Wu and Chun-hung Li

9:30am Boosting Flexible Learning Ensembles with Dynamic Feature Selection
[abstract] [slides]
Alexander Borisov, Victor Eruhimov and Eugene Tuv

9:50am Piecewise Linear Regularized Solution Paths
[abstract] [slides]
Saharon Rosset and Ji Zhu

10:10am Feature Selection with Sensitivity Analysis for Direct Kernel Partial Least Squares (DK-PLS)
[abstract] [slides]
Mark J. Embrechts

Friday Dec. 12, afternoon session 4:00am-7:00am
Feature Extraction
Chair: Kristin Bennett

4:00pm Spectral Dimensionality Reduction via Learning Eigenfunctions
[abstract] [slides]
Yoshua Bengio

4:30pm Protein Sequence Motifs: Highly Discriminative Features for Function Prediction
[abstract] [slides]
Asa Ben Hur

4:50pm Feature Construction: Variations on PCA and Company
[abstract]
K. Bennett

5:10pm Feature Extraction for Image Interpretation
[abstract] [slides]
Ilya Levner and Vadim Bulitko

5:30pm Break

5:40pm Feature Extraction with Description Logics Functional Subsumption
[abstract] [slides]
Rodrigo de Salvo Braz and Dan Roth

6:00pm Feature Selection with the Potential Support Vector Machine
[abstract] [slides]
Sepp Hochreiter

6:20pm Information Based Supervised and Semi-Supervised Feature Selection
[abstract]
Sang-Keun Lee, Seung-Joon Yi and Byoung-Tak Zhang

6:40pm Lessons Learned from the Feature Selection Competition
[abstract][pdf abstract]
Nitesh V. Chawla, Grigoris Karakoulas, and Danny Roobaert

6:55pm Method description
[slides]
Thomas Navin Lal and Olivier Chapelle

Information from challenge participants not coming to the workshop:

Nameless: Feature Selection Challenge Attempt
[slides]
Ran Gilad-Bachrach and Amir Navot

NIPS Feature Selection Challenge: Details On Methods
[report]
Amir Reza Saffari Azar
 

Links 
Special issue of JMLR on variable and feature selection
The Journal of Machine Learning Research http://www.jmlr.org/ published this year the proceedings of the NIPS 2001 workshop on variable and feature selection and other contributions on that topic. This issue, organized and edited by Isabelle Guyon and André Elisseeff, contains 14 papers, including an introduction to the field by the guest editors. In addition to the papers, many of the authors have made available the data sets and software used in their research.

Data mining competitions:
A list of data mining competitions maintained by KDnuggets, including the well known KDD cup.

List of datasets for machine learning:
A rather comprehensive list maintained by MLnet.

On-line machine learning resources:
Includes pointers to software and data. The collections include the famous UCI repositories, the DELVE platform of University of Toronto, and other resources.

CAMDA
Critical Assessment of Microarray Data Analysis, an annual conference on gene expression microarray data analysis. This conference includes a context with emphasis on gene selection, a special case of feature selection.

ICDAR
International Conference on Document Analysis and Recognition, a bi-annual conference proposing a contest in printed text recognition. Feature extraction/selection is a key component to win such a contest.

TREC
Text Retrieval conference, organized every year by NIST. The conference is organized around the result of a competition. Past winners have had to address feature extraction/selection effectively.

ICPR
In conjunction with the International Conference on Pattern Recognition, ICPR 2004, a face recognition contest is being organized.

CASP
An important competition in protein structure prediction called Critical Assessment of
 Techniques for Protein Structure Prediction.

Contact information 
Workshop chair:
Isabelle Guyon
Clopinet Enterprises
955, Creston Road,
Berkeley, CA 94708, U.S.A.
Tel/Fax: (510) 524 6211

Other organizers: 
Proceedings publication: Masoud Nikravesh.
Program advisors: Kristin Bennett, Richard Caruana.
Challenge assistants: Asa Ben-Hur, André Elisseeff, Gideon Dror.
Challenge webmaster: Steve Gunn.

Acknowledgments:
We thank the people who made the data we are using publicly available. They will be nominatively acknowledged at the end of the challenge when we reveal the identity of the datasets.