251-0553-00L
Feature Extraction
(winter semester 2005/2006)
*** NEW: The course is ended, take a look at the great work of the students! ***
*** NEWER: The course was followed by a reading group on causality inference ***
This course will cover feature extraction fundamentals and applications. Feature extraction is an essential pre-processing step to pattern recognition and machine learning problems. It is often decomposed into feature construction and feature selection. Classical algorithms of feature construction will be reviewed. More attention will be given to the feature selection step because of the recent success of methods involving a large number of "low-level" features (image pixels, text "bag-of-word", molecular structural features, gene expression coefficients.)
The course will be attractive to students who like playing with data and want to learn practical data analysis techniques. The instructor has ten years of experience with consulting for startup companies in the US in pattern recognition and machine learning. Datasets from a variety of application domains will be made available: handwriting recognition, medical diagnosis, drug discovery, text classification, ecology, marketing. The students will invent their own methods and compare their results to those of the NIPS 2003 feature selection challenge. They will also have the opportunity to participate in a live competition.
Machine learning and statistics students will reinforce their knowledge of modeling and assessment methods, including statistical testing and performance bounds. They will learn state-of-the-art methods of feature selection and space dimensionality reduction from the book written by the best entrants of the NIPS 2003 challenge.
Bioinformatics students will gain tools, which are essential to efficiently perform medical diagnosis and biomarker discovery from genomics or proteomics data. Drug discovery from QSAR data and protein classification from "bags-of-motifs" will also be addressed.
Lectures: Thursday 10:00AM-12:00AM, CAB G 59 (plan)
Exercises: Thursday 12:00AM-13:00PM, CAB
G 59 (Fill free to bring a sandwich
to the exercise class.)
Instructor:
Isabelle Guyon
Pattern Analysis and Machine
Learning Group
Office: CAB E 45 Universitätstrasse
6
Phone: (044) 63 23010
E m a il: guyoni@inf.ethz.ch
Office
hours: Tuesday 10:00AM-12:00AM.
Prerequisites
These are recommended
but will not be strictly enforced.
251-0535-00L Machine Learning I: Algorithms and Applications.
Some knowledge of matrix algebra, applied statistics, and optimization methods.
Homework will be given each week and discussed the week after in the exercise class. They should be turned in no later than Tuesday. Students can team up by pairs to do the homework.
Some homeworks will consist in classical course questions and exercises. Programming exercises will be performed in Matlab, using the CLOP library of Matlab learning objects built on top of the Spider package. A number of homeworks will be directed to preparing the students to their professional life:
- You are an expert in feature extraction contacted to solve a given pattern recognition problem. Write a two-page proposal outlining your strategy to win the contract.
- You have done a piece of original research and want to submit it to a conference. Write a two-page extended summary.
- You have invented a new revolutionary algorithm. Write the ten first claims of a patent that will protect it.
- You are an expert in feature extraction contacted by an editor to review a paper. Write the review.
We will provide in class methods to complete these tasks.
The students assigned to preparing the presentation of a book chapter a given week will be exempted from homework.
Requirements:
1. The main course requirement is to make one entry in the feature selection challenge matching some standard of quality, which will be precised in class. A toolbox in Matlab is provided with some basic algorithms. The students are encouraged to try some existing methods and to try to invent their own method.
2. A second requirement is to select a chapter of the feature extraction book and prepare a 45 minute presentation (30 min lecture+15 min questions). The instructor will meet with each student to help with the preparation, and ensure that the resulting presentation will be interesting and accessible to students in the class who are not experts in the given topic. Please sign up for chapters to present, on a first-come first-serve basis.
3. As a third requirement, students will have to describe the method they used and the results they obtained for their entry to the feature selection challenge in a poster. The final exam will consist in a poster presentation. Basic questions from the lists provided every week may be asked at that time.
Students can team up by pairs to fulfill the course requirements.
Grading:
1. Challenge submissions (10 points):
For each dataset, a baseline model is provided having a baseline performance BER0 and number of features n0
Earn 1 point per dataset for a valid submission having {BER<BER0 , any n} or {BER<=BER0 and n<n0}
Earn 1 additional point per dataset for a submission outperforming or matching the performance of the best challenge entry within the statistical error bar.
2. Paper presentation (5 points)
3. Final exam, poster + questions (17 points)
[Template of poster] [List of questions]
(contents=4; presentation=4; questions: 9)
Grade = min(6, num_points/4);
Pass: Grade >= 4The class is worth 5 units.
Schedule
Students will select one chapter of the feature
extraction book from the
following list (green-shaded cells) to fulfill the course second
requirement. Students should sign up for a date of presentation.
All this will be done on a first-come first-serve basis. Please
email the instructor. Tutorial chapters (yellow shaded cells)
will be presented by the instructor. The homeworks shaded in dark
yellow may be used towards the course first and third requirements,
but students are free to make other entries.
Topic | Dates (Thursday) |
Book chapter | Presenters | Exercise class | |
1 |
Introduction |
October 27 |
Introduction to Feature Extraction Isabelle Guyon and André Elisseeff |
I. Guyon |
The exercise class will
be devoted to an introduction to Matlab learning objects. |
2 | Learning machines |
November 3 |
Chapter 1: Learning Machines Norbert Jankowski and Krzysztof Grabczewski |
I. Guyon |
Solution of homework1:
Program a preprocessing learning object. |
3 | Shrinkage | November 10 |
I. Guyon et al., Kernel Ridge Regression tutorial
I. Guyon
Linear discriminant and support vector classifiers Isabelle Guyon and David Stork (for the exercises) |
I. Guyon |
Derive common learning rules
by computing the gradient of common risk functionals. Instructions of homework 3 (homework2 extended.) |
4 | Feature construction |
November 17 |
Introduction to
Feature Extraction Isabelle Guyon and André Elisseeff |
I. Guyon |
Solution of homework2-3:
Write your own feature construction object. Tips to complete homework 4. |
5 | Filter methods for feature ranking |
November 24 |
Chapter 3:
Filter methods Wlodzislaw Duch |
I. Guyon |
Homework
5: Make an entry for the Gisette dataset. |
6 |
Assessment methods |
December 1 |
Book Appendix
A: Elementary Statistics Gérard Dreyfus Chapter 2: Assessment Methods Gérard Dreyfus and Isabelle Guyon |
I. Guyon |
Small problems in applied
statistics (Student T-test etc.) Homework
6: Write a proposal for solving a problem involving
feature extraction. Give special care to explaining assessment
methods. |
7 | Support Vector Machines and Filter methods |
December 8 |
A. Chapter
12: Combining support vector machines with various feature
selection strategies by Yi-Wei Chen and Chih-Jen
Lin. B. Chapter 20: Combining a filter method with SVMs by Thomas Navin Lal, Olivier Chapelle, and Bernhard Schoelkopf. |
Lecture: I.Guyon
A-B. Georg Schneider |
Instructions for Homework 7: Design a "new" filter. Implement
the probe method and compute the false discovery rate. |
8 | Wrappers |
December 15 |
Chapter 4: Search Strategies
Juha Reunanen |
I. Guyon |
The exercise class will
be devoted to giving tips for making a good oral presentation.
Homework 8: Make a challenge entry for
the Dexter. |
9 | Embedded methods |
December 22 |
Chapter 5:
Embedded Methods Thomas Navin Lal, Olivier Chapelle, Jason Weston, and André Elisseeff |
I. Guyon
A. Elisseeff |
Correction of homework 8. Exercises on embedded methods. |
|
Weihnachtsferien |
December 29 |
|||
Weihnachtsferien |
January 5 |
||||
10 |
Embedded methods |
January 12 |
A. Chapter
16: Sparse, Flexible and Efficient Modeling using L1 Regularization Saharon Rosset and Ji Zhu. B. Chapter 18: Bayesian SVM for feature weighting and selection, by Wei Chu et al. |
A. Markus Uhr
B. Patrick Pletscher
|
Exercise class: How to protect
your intellectual property. Homework
10: make an entry for the Madelon dataset using
the Relief filter. |
11 |
Information theoretic methods |
January 19 |
Chapter 6: Information-Theoretic Methods for
Feature
Selection and Construction by Kari Torkkola Feature Extraction by Non Parametric Mutual Information Maximization by Kari Torkkola |
I. Guyon |
Exercises: IT methods. Correction of homework 10: Tips to finalize the Madelon entry.
Homework 11: How to review a paper. |
12 |
Bayesian
and ensemble methods |
January 26 |
Chapter 7:
Ensemble Learning and Feature Selection Eugene Tuv Chapter 27: Spectral Dimensionality Reduction Y. Bengio at al. |
Lecture: I. Guyon Paper presentation: Yvonne Moh and Peter Orbanz |
Exercise: Adaboost. Homework 12: Using the Arcene dataset, experiment
with ensemble methods. |
13 |
Bayesian and ensemble methods |
February 2 |
A. Chapter
10: High Dimensional Classification with Bayesian Neural Networks and Dirichlet Diffusion Trees by Radford M. Neal and Jianguo Zhang B. Chapter 11: Ensembles of Regularized Least Squares Classifiers for High-Dimensional Problems by Kari Torkkola and Eugene Tuv. |
A. Jiwei Li B. Theodor Mader |
Tips to write an extended
summary and make a poster in preparation for the exam. Homework 13: baseline emthods. |
14 |
Wrap up |
February 9 |
The take-home message
of the class will be delivered. |
I. Guyon |
Exercise: Prepare for the
exam. Homework 14: Make an entry for Dorothea. |
Lecture outlines