Model selection workshop


	workshop and Performance Prediction Challenge Tuesday, July 18, 2006 Vancouver, British Columbia, Canada * Challenge results available * Background Model selection is a problem in statistics, machine learning, and data mining. Given training data consisting of input-output pairs, a model is built to predict the output from the input, usually by fitting adjustable parameters. Many predictive models have been proposed to perform such tasks, including linear models, neural networks, classification and regression trees, and kernel methods. The selection of an optimal model, which should perform best on test data, is the object of this workshop. A related problem is to find an optimal ensemble of models forming a committee and voting for the final decision according to given scores. Contributions to ensemble methods are also within the scope of the workshop. Part of the workshop will be devoted to the results of a challenge on performance prediction: How good are you at predicting how good you are? In most real world situations, it is not sufficient to provide a good predictor, it is important to assess accurately how well this predictor will perform on new unseen data. Before deploying a model in the field, one must know whether it will meet the specifications or whether one should invest more time and resources to collect additional data and/or develop more sophisticated models. The performance prediction challenge asks you to provide prediction results on new unseen test data AND to predict how good these predictions are. Therefore, you must design both a good predictive model and a good performance estimator. The performance prediction challenge is connected to model selection because accurate performance predictions are good model ranking criteria. We formatted five data sets for the purpose of benchmarking performance prediction in a controlled manner. The data sets span a wide variety of domains and have sufficiently many test examples to obtain statistically significant results. Challenge The WCCI 2006 performance prediction challenge is to obtain a good predictor AND predict how well it will perform on a large test set. Entrants must provide results on ALL five data sets provided. To facilitate entering results for all five data sets, all tasks are two-class classification problems. During the development period, participants may submit results on a validation subset of the data to obtain immediate feed-back. The final ranking will be performed on a separate test set. How to participate: The challenge was open since September 30, 2005 and until March 1, 2006. The challenge is now over, check the results. The challenge web site will soon reopen for post-challenge submissions. Some of the participants and organizers at WCCI 2006 Participation in the workshop Participation in the workshop is not conditioned to entering the challenge. Likewise, challenge entrants are not required to attend the workshop nor to publish the methods they employed. Challenge entrants may remain anonymous during the development period, but only identified entrants will be included in the final competition ranking. To submit a paper to the workshop: Over. The best contributions will be invited to submit a paper to a special topic of the Journal of Machine Learning Research. Participants are also encouraged to submit negative results to the Journal of Interesting Negative Results. Workshop schedule The modelselect `papers` presented at the conference are linked to for convenience. Contact the organizers to gain access if you are entitled to. Session TueAM-5: IJCNN Competition Program- Performance Prediction Challenge I Tuesday, July 18, 8:00AM-10:00AM, Room: Junior Ballroom 8:00am -- Performance Prediction Challenge Isabelle Guyon, Amir Reza Saffari Azar Alamdari, Gideon Dror and Joachim Buhmann [Slides][Paper] 8:20am -- LogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets Roman Lutz [Slides][Paper] 8:40am -- Leave-one-out Cross-validation Based Model Selection Criteria for Weighted LS-SVMs Gavin Cawley [Slides][Paper] 9:00am -- Classification with Tree-based Ensembles Applied to the WCCI 2006 Performance Prediction Challenge Datasets Corinne Dahinden [Slides][Paper] 9:20am -- Model Selection: An Empirical Study on Two Kernel Classifiers Wei Chu [Slides][Paper] 9:40am -- Regularization and Averaging of the Selective Naive Bayes Classifier Marc Boullé [Slides][Paper] Session TueMM-5: S4: Model Selection Tuesday, July 18, 1:00PM-3:00PM, Room: Junior Ballroom D 1:00pm -- Nonlinear Model Selection Based on the Modulus of Continuity Imhoi Koo and Rhee Kil [Slides][Paper] 1:20pm -- Semi-supervised Model Selection Based on Cross-Validation Matti Kaariainen [Slides][Paper] 1:40pm -- New Formulation of SVM for Model Selection Mathias Adankon and Mohamed Cheriet [Slides][Paper] 2:00pm -- Common Subset Selection of Inputs in Multiresponse Regression Timo Simila and Jarkko Tikka [Slides][Paper] 2:20pm -- Breakdown Point of Model Selection When the Number of Variables Exceeds the Number of Observations David Donoho and Victoria Stodden [Slides][Paper] 2:40pm -- Model Selection via Bilevel Optimization Kristin Bennett, Jing Hu, Xiaoyun Ji, Gautam Kunapuli and Jong-Shi Pang [Slides][Paper] Session TuePM-5: IJCNN Competition Program- Performance Prediction Challenge II Tuesday, July 18, 3:15PM-5:15PM, Room: Junior Ballroom D 3:15pm -- Feature Selection Using Ensemble Based Ranking Against Artificial Contrasts Eugene Tuv, Alexander Borisov and Kari Torkkola [Slides][Paper] 3:35pm -- Model Selection in an Ensemble Framework Joerg D. Wichard [Slides][Paper] 3:55pm -- Learning with Mean-variance Filtering, SVM and Gradient-based Optimization Vladimir Nikulin [Slides] [Talk][Paper] 4:15pm -- A Study of Supervised Learning with Multivariate Analysis on Unbalanced Datasets Yu-Yen Ou, Hao-Geng Hung and Yen-Jen Oyang [Slides][Paper] 4:35pm - Competition Panel: An open discussion on the results of the competition and planning for future such events.[Log of the discussion] Links NIPS 2003 workshop on feature extraction and feature selection challenge. We organized a competition on five data sets in which hundreds of entries were made. The web site of the challenge is still available for post challenge submissions. Measure yourself against the winners! Pascal challenges: The Pascal network is sponsoring several challenges in Machine learning. Data mining competitions: A list of data mining competitions maintained by KDnuggets, including the well known KDD cup. List of data sets for machine learning: A rather comprehensive list maintained by MLnet. On-line machine learning resources: Includes pointers to software and data. The collections include the famous UCI repositories, the DELVE platform of University of Toronto, and other resources. CAMDA Critical Assessment of Microarray Data Analysis, an annual conference on gene expression microarray data analysis. This conference includes a context with emphasis on gene selection, a special case of feature selection. ICDAR International Conference on Document Analysis and Recognition, a bi-annual conference proposing a contest in printed text recognition. Feature extraction/selection is a key component to win such a contest. TREC Text Retrieval conference, organized every year by NIST. The conference is organized around the result of a competition. Past winners have had to address feature extraction/selection effectively. ICPR In conjunction with the International Conference on Pattern Recognition, ICPR 2004, a face recognition contest is being organized. CASP An important competition in protein structure prediction called Critical Assessment of Techniques for Protein Structure Prediction. Contact information Principal Investigator: Isabelle Guyon Clopinet Enterprises 955, Creston Road, Berkeley, CA 94708, U.S.A. Tel/Fax: (510) 524 6211 Collaborators and advisors: Steve Gunn (University of Southampton), Yoshua Bengio (University of Montréal), Asa Ben-Hur (Colorado State University), Joachim Buhmann (ETH, Zurich), Gideon Dror (Academic College of Tel-Aviv-Yaffo), Olivier Guyon (MisterP services), Amir Reza Saffari Azar (Graz University of Technology), Lambert Schomaker (University of Groningen), and Vladimir Vapnik (NEC, Princeton). This project is supported by the National Science Fundation under Grant N0. ECS-0424142. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.