Feature and Model Selection 

MLSS08, Machine Learning Summer School 

Ile de Ré, France, September 2-5, 2008

  Ile de Re


This course covers feature and model selection fundamentals and applications. The students will first be reminded of the basics of machine learning algorithms and the problem of overfitting avoidance. In the wrapper setting, feature selection will be introduced as a special case of the model selection problem. Methods to derive principled feature selection algorithms will be reviewed as well as heuristic method, which work well in practice. A lecture will be devoted to the connections between feature section and causal discovery. The last lecture will dig further into the problem of causal modeling and contrast the problems of prediction in an observational setting from that of predicting the consequences actions performed by external agents. The class will be accompanied by several lab sessions. The course will be attractive to students who like playing with data and want to learn practical data analysis techniques. Datasets from a variety of application domains will be made available: handwriting recognition, medical diagnosis, drug discovery, text classification, ecology, marketing.

       As a first reading, we recommend the short tutorial: Practical Feature Selection: from Correlation to Causality.

       The class is based on material from:
       - a book, which compiles the results of the NIPS 2003 feature selection challenge and includes tutorial chapters.
       Download the feature extraction book introduction. Copies of the full book may be purchased.
       - a book in preparation, which compiles the results of the WCCI06 performance prediction challenge and IJCNN07 ALvsPK challenge. (Ask the instructor to get the password.)

         The students must have Matlab(R) and R installed on their laptop. R can be downloaded for free. The student version of Matlab can be obtained form The MathWorks.

         
CLOP Matlab package Installation
        Download a version of CLOP (version 1.5) designed for this class. There are other versions of CLOP, please do not use them.
        In addition, you will need demo code and data.

       ==> Windows users will just have to run the script 'use_spider_clop' to set the Matlab path properly to use most functions.
       ==> Unix users will have to compile the LibSVM package if they want to use support vector machines. Follow the installation instructions.
       ==> All users will have to install R to use random forests (RF and RFFS). When you first start RF or RFFS, you will be prompted for the path of the R executable.

Schedule

  Slides Date
(September 2008)
Time
Description and class material

1

Introduction
Tue 2 Lecture1: 17h-17h45
Introduction to Machine Learning
Basic learning machines. Principle of learning.
Linear discriminant and support vector classifiers
Naive Bayes
2
Overfitting
Tue 2 Lecture2: 18h-18h45
Learning without overlearning
Overfitting avoidance, performance prediction, cross-validation
Structural risk minimization for character recognition
Support vector machine tutorial
Kernel Ridge Regression tutorial
3
Lab 1
Wed 3
Practical work group 1: 14h-14h45
Basic learning machines and the overfitting problem
Install CLOP and download the code in advance and play with the examples.
4
Lab 1
Wed 3
Practical work group 2: 15h-15h45
Same
5
Lab 1
Wed 3
Practical work group 3: 16h-16h45
Same
6
Feature selection 1
Thu 4
Lecture3: 8h45-9h30
Introduction to feature selection
Filters, wrappers, and embedded methods
Practical Feature Selection: from Correlation to Causality
7
Feature selection 2
Thu 4
Lecture4: 9h45-10h30
Embedded methods of feature selection
Learning theory put to work to build feature selection algorithms
Practical Feature Selection: from Correlation to Causality
8
Lab 2
Thu 4
Practical work group 1: 14h-14h45
Introduction to the CLOP Matlab toolbox
Install CLOP and download the code in advance and play with the examples.
Apply various algorithms, visualize the results.
9
Lab 2
Thu 4
Practical work group 2: 15h-15h45
Same
10
Lab 2
Thu 4
Practical work group 3: 16h-16h45
Same

11

Causality 1
Fri 5
Lecture5: 8h45-9h30
Causality and feature selection
Limitations of methods of feature selection ignoring the data selection process

Causal feature selection

12

Causality 2
Fri 5
Lecture6: 9h45-10h30
Causation and prediction
Basic causal discovery algorithms. Prediction under maipulation.
Causality Workbench

13

Lab 3
Fri 5
Practical work group 1: 14h-14h45
Play with feature selection on real data
Install CLOP and download the code in advance and play with the examples.
Solve and end-to-end machine learning problem.

14

Lab 3
Fri 5
Practical work group 2: 15h-15h45
Same

15

Lab 3
Fri 5
Practical work group 3: 16h-16h45
Same

16

Full model selection 1
Fri 5
Lecture7: 17h-17h45
Full model selection techniques
Review of search methods and fitness functions (H. Jair Escalante).

17

Full model selection 2
Fri 5
Lecture8: 18h-18h45
Particle Swarm Optimization
Description of algorithm and demonstration (H. Jair Escalante).

References