Feature and Model Selection
MLSS08, Machine Learning Summer School
Ile de Ré, France, September 2-5, 2008
This course covers feature and model selection fundamentals and applications. The students will first be reminded of the basics of machine learning algorithms and the problem of overfitting avoidance. In the wrapper setting, feature selection will be introduced as a special case of the model selection problem. Methods to derive principled feature selection algorithms will be reviewed as well as heuristic method, which work well in practice. A lecture will be devoted to the connections between feature section and causal discovery. The last lecture will dig further into the problem of causal modeling and contrast the problems of prediction in an observational setting from that of predicting the consequences actions performed by external agents. The class will be accompanied by several lab sessions. The course will be attractive to students who like playing with data and want to learn practical data analysis techniques. Datasets from a variety of application domains will be made available: handwriting recognition, medical diagnosis, drug discovery, text classification, ecology, marketing.
Instructor:
Isabelle Guyon
ClopiNet consulting, Berkeley, California
E m a il: isabelle@clopinet.com
As a first reading, we recommend the short tutorial: Practical Feature Selection: from Correlation to Causality.
The class is based on material from:
- a book, which compiles the results of
the NIPS 2003 feature
selection challenge and includes tutorial chapters.
Download the feature
extraction
book introduction. Copies of the full book may be purchased.
- a book in preparation,
which compiles the results of the WCCI06 performance
prediction challenge and IJCNN07
ALvsPK challenge. (Ask the instructor to get the password.)
The students must have Matlab(R) and R
installed on their laptop. R can
be downloaded for free. The student
version of Matlab can be obtained form The MathWorks.
CLOP Matlab package Installation
Download a
version of CLOP (version 1.5) designed for this class.
There are other versions of CLOP, please do not use them.
In addition, you will need demo
code and data.
==> Windows users will just have to run the
script 'use_spider_clop' to set the Matlab path properly to use most
functions.
==> Unix users will have to
compile the LibSVM package if they want to use support vector machines.
Follow the installation instructions.
==> All users will have
to install R to use random forests (RF and RFFS). When you first start
RF or RFFS, you will be prompted for the path of the R executable.
Schedule
Slides | Date (September 2008) |
Time |
Description and class material | |
1 |
Introduction |
Tue 2 | Lecture1: 17h-17h45 |
Introduction to
Machine Learning Basic learning machines. Principle of learning. Linear discriminant and support vector classifiers Naive Bayes |
2 |
Overfitting |
Tue 2 | Lecture2:
18h-18h45 |
Learning without overlearning Overfitting avoidance, performance prediction, cross-validation Structural risk minimization for character recognition Support vector machine tutorial Kernel Ridge Regression tutorial |
3 |
Lab
1 |
Wed 3 |
Practical work group 1:
14h-14h45 |
Basic learning machines and
the overfitting problem
|
4 |
Lab
1 |
Wed 3 |
Practical work group 2:
15h-15h45 |
Same |
5 |
Lab
1 |
Wed 3 |
Practical work group 3:
16h-16h45 |
Same |
6 |
Feature selection 1 |
Thu 4 |
Lecture3:
8h45-9h30 |
Introduction to feature
selection
Filters, wrappers, and embedded methods
|
7 |
Feature selection 2 |
Thu 4 |
Lecture4:
9h45-10h30 |
Embedded methods of feature
selection
Learning
theory put to work to build feature selection algorithms
Practical Feature Selection: from Correlation to Causality |
8 |
Lab
2 |
Thu 4 |
Practical work group 1:
14h-14h45 |
Introduction to the CLOP
Matlab toolbox
Install CLOP and download
the
code in advance and play
with the examples.Apply various algorithms, visualize the results. |
9 |
Lab
2 |
Thu 4 |
Practical work group 2:
15h-15h45 |
Same |
10 |
Lab
2 |
Thu 4 |
Practical work group 3:
16h-16h45 |
Same |
11 |
Causality 1 |
Fri 5 |
Lecture5:
8h45-9h30 |
Causality and feature
selection Limitations of methods of feature selection ignoring the data selection process Causal feature selection |
12 |
Causality 2 |
Fri 5 |
Lecture6:
9h45-10h30 |
Causation and prediction
Basic
causal discovery algorithms. Prediction under maipulation. |
13 |
Lab
3 |
Fri 5 |
Practical work group 1:
14h-14h45 |
Play with feature selection
on real data
Install CLOP
and download
the code in advance and play
with the examples.
Solve and end-to-end machine learning problem. |
14 |
Lab
3 |
Fri 5 |
Practical work group 2:
15h-15h45 |
Same |
15 |
Lab
3 |
Fri 5 |
Practical work group 3:
16h-16h45 |
Same |
16 |
Full model selection 1 |
Fri 5 |
Lecture7:
17h-17h45 |
Full model selection
techniques
Review of search methods and
fitness functions (H. Jair Escalante).
|
17 |
Full model selection 2 |
Fri 5 |
Lecture8:
18h-18h45 |
Particle Swarm
Optimization
Description
of algorithm and demonstration (H. Jair Escalante). |
References