WCCI 06 workshop on model selection

Summary of the discussion session

1) Kari: How one can get the right data, this is the main problem in organizing a challenge?
2) Leandro: Given raw data (time series are more interesting), the main question is how to extract proper features from them, specially finding those features that are not obvious from outside or by experts. Also in many cases the IID assumption does not hold for real world data.
3) Matti: How to make a proper ranking function for challenge participants? May be a multi-objective cost function would be interesting for this purpose.
4) Wei Chu: Having a wide range of datasets in challenge might be interesting, like having sets with very small number of examples up to large sets.
5) Cilinia: Introducing various controlled difficulties into datasets might be interesting for a challenge. For example in many real world applications, we see sampling bias, small number of examples, large number of features, noisy features, missing data, no available ground truth (even class labels carries uncertainty). So an idea would be to have a synthetical data with a controlled way of introducing those difficulties and uncertainties into data sets.
6) Ou: An interesting problem to be addressed is to test different classifiers on the different datasets to see which ones have advantages over the other ones on which domain of data.
7) Wichard: As previous challenges, keeping the generality of the sets and competition is important, so that those without domain knowledge can participate equally. This attracts more people to the challenge. Also the starter software package is desirable. One idea for next challenge is to have missing and/or changing (non-IID) data problem. Better advertising and announcement of the upcoming challenges is also desirable.
8) Corinne: Having secret datasets without information about the source is better than those from special domains.
9) Vladimir: In other competitions, the source of the datasets are available from beginning, but usually they focus each year on just one type of sources. Also taking into account run-time requirements of algorithms might be a good idea, like having a ranking method that gives negative scores to those with algorithms that need lots of computations.
10) Roman: If we use real life datasets, then there is no need to add noise or random probes to the data.
11) Marc: Mixed datasets are better, like having both binary and real valued features. It might be a good idea also to use AUC instead of BER as performance measure. In addition, some experiments on different dataset size (number of examples specially) might be interesting, sometimes having more examples can be dangerous. In addition, one can use structured data instead of prepared matrix format.
12) Yi: Having diversity in datasets are more interesting, like different categories, different sizes, different +/-1 class ratios, large number of features, having some unlabeled examples. Also, one objective which is more important in some real world applications is, for example, detecting all positives and minimizing the error on the remaining negatives.
13) Crowne: We has some datasets, usually on time series forecasting. It might be interesting to bring different domains together. There exist a committee in IEEE computational intelligence society for data mining.
14) Gawin: Submitting just one solution, instead of multiple models, because it is difficult to make sense out of multiple submissions. Having an official registration form for participants to sign up before submitting final results is a good idea.
15) Stefan: It is interesting to have a competition with subfields, each subfield is specialized in a domain, and then comparing results to each other. Also, if a competition is relevant and interesting to some companies, it is possible to get real world data with negotiating with those companies.
16) Eugene: It is important to organize challenges in high ranked conferences, like NIPS. About the competition, it is interesting to have parametric data, missing data, large number of features, structured data, multi-class, complicated cost functions, run-time restrictions (those algorithms that takes hours or days to finish are not acceptable in most companies), interpretability of the features (it is not sufficient to have just features, but those features that make sense for an expert), less parameters to be tuned in algorithms.

Updates (16 August 2006): Many thanks to Vladimir for mentioning the following notes which were missing in my summary. I also added my own feedback for some of them.

1) The size of the validation size should be big enough in order to ensure some reasonable correspondence between validation and final testing. AMIR: I dropped this part because in my opinion this was the main trick of the challenge, to realize that the performance on validation set provided in challenge datasets is not a good measure for how good algorithms perform on test set and further to encourage participants to come with their own evaluation methods, which could be used as a criteria for model selection as well.

2) The evaluation criteria may be any, but it must be formulated very strictly in order to exclude any possibility of further discussions (all used parameters must be given). AMIR: I think we had a section in challenge FAQ, called "What is the scoring method?" which was clearly describing the scoring methods we used in final evaluation.

3) In order to attract a wider participation in the Competition it will be good not to give a significant advantage to the contestants who have access to supercomputers; and also to include exotic sub-tasks, which will require some mathematical/statistical creativity (as an examples, consider UC-2005/06, KDD-2006 and NiSIS-2006 (time series) Data Mining Contests)."

4) It was mentioned during the Workshop that the following game will provide a second chance for whose who are not completely satisfied with results of the WCCI-2006 Challenge.

5) It will be fair not to impose such restriction as to use only the given Matlab toolkit "Spider" in order to prepare submissions for next challenge.