Challenge Challenge

Participation Participation

Schedule Schedule

Links Links

Contacts Contacts



Model Selection
workshop and
Performance Prediction Challenge

Tuesday, July 18, 2006
Vancouver, British Columbia, Canada

*** Challenge results available ***


Model selection is a problem in statistics, machine learning, and data mining. Given training data consisting of input-output pairs, a model is built to predict the output from the input, usually by fitting adjustable parameters.  Many predictive models have been proposed to perform such tasks, including linear models, neural networks, classification and regression trees, and kernel methods. The selection of an optimal model, which should perform best on test data, is the object of this workshop. A related problem is to find an optimal ensemble of models forming a committee and voting for the final decision according to given scores. Contributions to ensemble methods are also within the scope of the workshop.

Part of the workshop will be devoted to the results of a challenge on performance prediction:
                   How good are you at predicting how good you are?

In most real world situations, it is not sufficient to provide a good predictor, it is important to assess accurately how well this predictor will perform on new unseen data. Before deploying a model in the field, one must know whether it will meet the specifications or whether one should invest more time and resources to collect additional data and/or develop more sophisticated models. The performance prediction challenge asks you to provide prediction results on new unseen test data AND to predict how good these predictions are. Therefore, you must design both a good predictive model and a good performance estimator.

The performance prediction challenge is connected to model selection because accurate performance predictions are good model ranking criteria. We formatted five data sets for the purpose of benchmarking performance prediction in a controlled manner. The data sets span a wide variety of domains and have sufficiently many test examples to obtain statistically significant results.

The WCCI 2006 performance prediction challenge is to obtain a good predictor AND predict how well it will perform on a large test set. Entrants must provide results on ALL five data sets provided. To facilitate entering results for all five data sets, all tasks are two-class classification problems. During the development period, participants may submit results on a validation subset of the data to obtain immediate feed-back. The final ranking will be performed on a separate test set. 

How to participate:
The challenge was open since September 30, 2005 and until March 1, 2006. The challenge is now over, check the results.
The challenge web site will soon reopen for post-challenge submissions.

Some of the participants and organizers at WCCI 2006

Participation in the workshop
Participation in the workshop is not conditioned to entering the challenge. Likewise, challenge entrants are not required to attend the workshop nor to publish the methods they employed. Challenge entrants may remain anonymous during the development period, but only identified entrants will be included in the final competition ranking.

To submit a paper to the workshop: Over.
The best contributions will be invited to submit a paper to a special topic of the Journal of Machine Learning Research. Participants are also encouraged to submit negative results to the Journal of Interesting Negative Results.

Workshop schedule

The modelselect papers presented at the conference are linked to for convenience. Contact the organizers to gain access if you are entitled to.

Session TueAM-5: IJCNN Competition Program- Performance Prediction Challenge I
Tuesday, July 18, 8:00AM-10:00AM, Room: Junior Ballroom

8:00am -- Performance Prediction Challenge
Isabelle Guyon, Amir Reza Saffari Azar Alamdari, Gideon Dror and Joachim Buhmann [Slides][Paper]
8:20am -- LogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets
Roman Lutz [Slides][Paper]
8:40am -- Leave-one-out Cross-validation Based Model Selection Criteria for Weighted LS-SVMs
Gavin Cawley [Slides][Paper]
9:00am -- Classification with Tree-based Ensembles Applied to the WCCI 2006 Performance Prediction Challenge Datasets
Corinne Dahinden [Slides][Paper]
9:20am -- Model Selection: An Empirical Study on Two Kernel Classifiers
Wei Chu [Slides][Paper]
9:40am -- Regularization and Averaging of the Selective Naive Bayes Classifier
Marc Boullé [Slides][Paper]
Session TueMM-5: S4: Model Selection
Tuesday, July 18, 1:00PM-3:00PM, Room: Junior Ballroom D

1:00pm -- Nonlinear Model Selection Based on the Modulus of Continuity
                Imhoi Koo and Rhee Kil [Slides][Paper]
1:20pm -- Semi-supervised Model Selection Based on Cross-Validation
                Matti Kaariainen [Slides][Paper]
1:40pm -- New Formulation of SVM for Model Selection
                Mathias Adankon and Mohamed Cheriet [Slides][Paper]
2:00pm -- Common Subset Selection of Inputs in Multiresponse Regression
                Timo Simila and Jarkko Tikka [Slides][Paper]
2:20pm -- Breakdown Point of Model Selection When the Number of Variables Exceeds the Number of Observations
                David Donoho and Victoria Stodden [Slides][Paper]
2:40pm -- Model Selection via Bilevel Optimization
                Kristin Bennett, Jing Hu, Xiaoyun Ji, Gautam Kunapuli and Jong-Shi Pang [Slides][Paper]
Session TuePM-5: IJCNN Competition Program- Performance Prediction Challenge II
Tuesday, July 18, 3:15PM-5:15PM, Room: Junior Ballroom D

3:15pm -- Feature Selection Using Ensemble Based Ranking Against Artificial Contrasts
Eugene Tuv, Alexander Borisov and Kari Torkkola [Slides][Paper]
3:35pm -- Model Selection in an Ensemble Framework
Joerg D. Wichard [Slides][Paper]
3:55pm -- Learning with Mean-variance Filtering, SVM and Gradient-based Optimization
Vladimir Nikulin [Slides] [Talk][Paper]
4:15pm -- A Study of Supervised Learning with Multivariate Analysis on Unbalanced Datasets
Yu-Yen Ou, Hao-Geng Hung and Yen-Jen Oyang [Slides][Paper]
4:35pm - Competition Panel: An open discussion on the results of the competition and planning for future such events.[Log of the discussion]


NIPS 2003 workshop on feature extraction and feature selection challenge. We organized a competition on five data sets in which hundreds of entries were made. The web site of the challenge is still available for post challenge submissions. Measure yourself against the winners!

Pascal challenges: The Pascal network is sponsoring several challenges in Machine learning.

Data mining competitions:
A list of data mining competitions maintained by KDnuggets, including the well known KDD cup.

List of data sets for machine learning:
A rather comprehensive list maintained by MLnet.

On-line machine learning resources:
Includes pointers to software and data. The collections include the famous UCI repositories, the DELVE platform of University of Toronto, and other resources.

Critical Assessment of Microarray Data Analysis, an annual conference on gene expression microarray data analysis. This conference includes a context with emphasis on gene selection, a special case of feature selection.

International Conference on Document Analysis and Recognition, a bi-annual conference proposing a contest in printed text recognition. Feature extraction/selection is a key component to win such a contest.

Text Retrieval conference, organized every year by NIST. The conference is organized around the result of a competition. Past winners have had to address feature extraction/selection effectively.

In conjunction with the International Conference on Pattern Recognition, ICPR 2004, a face recognition contest is being organized.

An important competition in protein structure prediction called Critical Assessment of
 Techniques for Protein Structure Prediction.

Contact information

Principal Investigator:
Isabelle Guyon
Clopinet Enterprises
955, Creston Road,
Berkeley, CA 94708, U.S.A.
Tel/Fax: (510) 524 6211

Collaborators and advisors: Steve Gunn (University of Southampton), Yoshua Bengio (University of Montréal), Asa Ben-Hur (Colorado State University), Joachim Buhmann (ETH, Zurich),  Gideon Dror (Academic College of Tel-Aviv-Yaffo),  Olivier Guyon (MisterP services), Amir Reza Saffari Azar (Graz University of Technology), Lambert Schomaker (University of Groningen), and Vladimir Vapnik (NEC, Princeton).

NSF logo This project is supported by the National Science Fundation under Grant N0. ECS-0424142. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.