251-0553-00L










Feature Extraction

(winter semester 2005/2006)



*** NEW: The course is ended, take a look at the great work of the students! ***
*** NEWER: The course was followed by a reading group on causality inference ***

This course will cover feature extraction fundamentals and applications. Feature extraction is an essential pre-processing step to pattern recognition and machine learning problems. It is often decomposed into feature construction and feature selection. Classical algorithms of feature construction will be reviewed. More attention will be given to the feature selection step because of the recent success of methods involving a large number of "low-level" features (image pixels, text "bag-of-word", molecular structural features, gene expression coefficients.)
The course will be attractive to students who like playing with data and want to learn practical data analysis techniques. The instructor has ten years of experience with consulting for startup companies in the US in pattern recognition and machine learning. Datasets from a variety of application domains will be made available: handwriting recognition, medical diagnosis, drug discovery, text classification, ecology, marketing. The students will invent their own methods and compare their results to those of the NIPS 2003 feature selection challenge.  They will also have the opportunity to participate in a live competition.
Machine learning and statistics students will reinforce their knowledge of modeling and assessment methods, including statistical testing and performance bounds. They will learn state-of-the-art methods of feature selection and space dimensionality reduction from the book written by the best entrants of the NIPS 2003 challenge.
Bioinformatics students will gain tools, which are essential to efficiently perform medical diagnosis and biomarker discovery from genomics or proteomics data. Drug discovery from QSAR data and protein classification from "bags-of-motifs" will also be addressed.

          Lectures: Thursday 10:00AM-12:00AM, CAB G 59 (plan)
          Exercises: Thursday 12:00AM-13:00PM, CAB G 59 (Fill free to bring a sandwich to the exercise class.)

Homework will be given each week and discussed the week after in the exercise class. They should be turned in no later than Tuesday. Students can team up by pairs to do the homework.

Some homeworks will consist in classical course questions and exercises. Programming exercises will be performed in Matlab, using the CLOP library of Matlab learning objects built on top of the Spider package. A number of homeworks will be directed to preparing the students to their professional life:

- You are an expert in feature extraction contacted to solve a given pattern recognition problem. Write a two-page proposal outlining your strategy to win the contract.
- You have done a piece of original research and want to submit it to a conference. Write a two-page extended summary.
- You have invented a new revolutionary algorithm. Write the ten first claims of a patent that will protect it.
- You are
an expert in feature extraction contacted by an editor to review a paper. Write the review.

We will provide in class methods to complete these tasks.
The students assigned to preparing the presentation of a book chapter a given week will be exempted from homework.

Requirements:

1. The main course requirement is to make one entry in the feature selection challenge matching some standard of quality, which will be precised in class. A toolbox in Matlab is provided with some basic algorithms. The students are encouraged to try some existing methods and to try to invent their own method.

2.    A second requirement is to select a chapter of the feature extraction book and prepare a 45 minute presentation (30 min lecture+15 min questions). The instructor will meet with each student to help with the preparation, and ensure that the resulting presentation will be interesting and accessible to students in the class who are not experts in the given topic. Please sign up for chapters to present, on a first-come first-serve basis.

3.    As a third requirement, students will have to describe the method they used and the results they obtained for their entry to the feature selection challenge in a poster. The final exam will consist in a poster presentation. Basic questions from the lists provided every week may be asked at that time.

Students can team up by pairs to fulfill the course requirements.

Grading:

1. Challenge submissions (10 points):
    For each dataset, a baseline model is provided having a baseline performance BER0 and number of features n0
    Earn 1 point per dataset for a valid submission having {BER<BER0 , any n} or {BER<=BER0 and n<n0}
    Earn 1 additional point per dataset for a submission outperforming or matching the performance of the best challenge entry within the statistical error bar.
2. Paper presentation (5 points)
3. Final exam, poster + questions (17 points)
    [Template of poster] [List of questions]
    (contents=4; presentation=4; questions: 9)

            Grade = min(6, num_points/4);
            Pass: Grade >= 4

The class is worth 5 units.

Schedule

Students will select one chapter of the feature extraction book from the following list (green-shaded cells) to fulfill the course second requirement. Students should sign up for a date of presentation. All this will be done on a first-come first-serve basis. Please email the instructor. Tutorial chapters (yellow shaded cells) will be presented by the instructor. The homeworks shaded in dark yellow may be used towards the course first and third requirements, but students are free to make other entries.

  Topic Dates
(Thursday)
Book chapter Presenters Exercise class

1

Introduction
October 27 Introduction to Feature Extraction
Isabelle Guyon and André Elisseeff
I. Guyon
The exercise class will be devoted to an introduction to Matlab learning objects.
2 Learning machines
November 3
Chapter 1: Learning Machines
Norbert Jankowski and Krzysztof Grabczewski

I. Guyon
Solution of homework1: Program a preprocessing learning object.
3 Shrinkage November 10
Structural risk minimization for character recognition
I. Guyon et al.,  Kernel Ridge Regression tutorial I. Guyon
Linear discriminant and support vector classifiers
Isabelle Guyon and David Stork (for the exercises)

I. Guyon

Derive common learning rules by computing the gradient of common risk functionals.
Instructions of homework 3 (homework2 extended.)
4 Feature construction
November 17
Introduction to Feature Extraction
Isabelle Guyon and André Elisseeff
I. Guyon
Solution of homework2-3: Write your own feature construction object. Tips to complete homework 4.
5 Filter methods
for feature ranking


November 24
Chapter 3: Filter methods
Wlodzislaw Duch


I. Guyon
Homework 5: Make an entry for the Gisette dataset.

6

Assessment methods
December 1
Book Appendix A: Elementary Statistics
Gérard Dreyfus

Chapter 2: Assessment Methods
Gérard Dreyfus and Isabelle Guyon

I. Guyon
Small problems in applied statistics (Student T-test etc.) Homework 6: Write a proposal for solving a problem involving feature extraction. Give special care to explaining assessment methods.
7 Support Vector Machines and Filter methods
December 8
A. Chapter 12: Combining support vector machines with various feature selection strategies by Yi-Wei Chen and Chih-Jen Lin.
B. Chapter 20: Combining a filter method with SVMs by Thomas Navin Lal, Olivier Chapelle, and Bernhard Schoelkopf.
Lecture: I.Guyon
A-B. Georg Schneider

Instructions for Homework 7: Design a "new" filter. Implement the probe method and compute the false discovery rate.
8 Wrappers
December 15
Chapter 4: Search Strategies
Juha Reunanen

I. Guyon
The exercise class will be devoted to giving tips for making a good oral presentation. Homework 8: Make a challenge entry for the Dexter.
9 Embedded methods
December 22
Chapter 5: Embedded Methods
Thomas Navin Lal, Olivier Chapelle, Jason Weston, and André Elisseeff
I. Guyon
A. Elisseeff

Correction of homework 8. Exercises on embedded methods.

Weihnachtsferien
December 29




Weihnachtsferien
January 5



10
Embedded methods
January 12
A. Chapter 16: Sparse, Flexible and Efficient Modeling using
L1 Regularization

Saharon Rosset and Ji Zhu.
B. Chapter 18: Bayesian SVM for feature weighting and selection, by Wei Chu et al.
A. Markus Uhr
B. Patrick Pletscher

Exercise class: How to protect your intellectual property. Homework 10: make an entry for the Madelon dataset using the Relief filter.
11
Information theoretic methods
January 19
Chapter 6: Information-Theoretic Methods for Feature
Selection and Construction by Kari Torkkola
Feature Extraction by Non Parametric Mutual Information Maximization  by Kari Torkkola
I. Guyon
Exercises: IT methods. Correction of  homework 10: Tips to finalize the Madelon entry. Homework 11: How to review a paper.
12
Bayesian and ensemble methods
January 26
Chapter 7: Ensemble Learning and Feature Selection
Eugene Tuv
Chapter 27: Spectral Dimensionality Reduction
Y. Bengio at al.
Lecture: I. Guyon
Paper presentation:  Yvonne Moh and Peter Orbanz
Exercise: Adaboost. Homework 12: Using the Arcene dataset, experiment with ensemble methods.
13
Bayesian and ensemble methods
February 2
A. Chapter 10: High Dimensional Classification with
Bayesian Neural Networks and Dirichlet Diffusion Trees
 by Radford M. Neal and Jianguo Zhang
B. Chapter 11: Ensembles of Regularized Least Squares Classifiers for High-Dimensional Problems by Kari Torkkola and Eugene Tuv.
A. Jiwei Li
B.
Theodor Mader

Tips to write an extended summary and make a poster in preparation for the exam. Homework 13: baseline emthods.
14
Wrap up
February 9
The take-home message of the class will be delivered.
I. Guyon
Exercise: Prepare for the exam. Homework 14: Make an entry for Dorothea.


Exam date : Monday, March 27, CAB G69.2


Lecture outlines

Week 1: Introduction [slides]
Questions of the week: a list of questions you should be able to answer after this lecture.
Homework for next week: install the Matlab learning object package.
Feature extraction is an essential step in machine learning, performed either separately or jointly with the learning process: preprocessing and feature construction are followed by feature selection. We will present some applications that pose challenges to feature extraction because the high dimensionality of input space (text categorization, drug screening, gene selection.)
A quick review of the machine learning vocabulary (supervised/unsupervised learning, risk minimization, overfitting, maximum likelihood, etc.) will also be done.
The exercise session will be devoted to explaining how to use Matlab learning objects and give guidelines to complete homework1 (for November 3).

Week 2: Learning Machines [slides]
Questions of the week
Homework for next week: make a first submission for Gisette.
This lecture will introduce briefly feature selection strategies (filters, wrappers, and embedded methods), which will be developed in the following lectures. The review of learning and generalization started in the first lecture will be continued.
We will also review learning machine architectures and algorithms: linear methods, kernel methods, and neural networks.

Week 3: Shrinkage [slides]
Questions of the week
Homework for next week: make a first submission for Gisette (continued).
Ockham's Razor is the principle proposed by William of Ockham in the fourteenth century: "Pluralitas non est ponenda sine neccesitate'', which translates as ``entities should not be multiplied unnecessarily''. This principle provides guidance in modeling: of two theories providing similarly good predictions, prefer the simplest one. Hence, according to Ockham, we should shave off unnecessary parameters of our models.
In the brain, information is stored in billions of synapses connecting the brain cells or neurons. The young children grow synapses in excess. Exposure to enriched environments with extra sensory and social stimulation enhances the connectivity of the synapses, but children and adolescents also lose them up to 20 million synapse per day. This synaptic pruning is believed to play an important role in learning.
In this lecture, we will justify theoretically the role of "weight decay" in learning. Will will connect it to regularization and model prior. In several models, the number of parameters of the model is directly related to the number of inputs or features, hence, there is an obvious link between shrinkage and feature selection.
We will make this link explicit.

Week 4: Feature construction [slides]
Questions of the week
Homework for next week
: write your own feature construction object.

This lecture will review basic feature construction strategies. The examples will include various transforms such as the Fourier transform, and kernel "convolutional" methods.This lecture will also review a number of preprocessing involving noise modeling, the identification of nuisance variables and give a brief introduction to experimental design.

Week 5: Filter methods for feature ranking [slides]
Questions of the week
Homework for next week: finalize your Gizette entry.
Correlation methods and other methods for ranking feature in order of relevance will be presented. The goal of this lecture is to provide the students with easy to implement  methods, which are also important baseline methods.
We will also take time to review the material of past lectures and put things in perspective.

Week 6: Assessment methods [slides]
Questions of the week
Homework for next week: Write a proposal.
The problems of constructing and selecting features cannot be separated from that of assessing how good the features are. Complex model selection problems have to be addressed in which both feature subsets and model hyper-parameters must be selected jointly. We will review tools is statistics and statistical learning theory, which allow us to accurately make such selection. These will include statistical testing techniques and cross-validation methods.

Week 7: Support Vector Machines and filter methods [lecture slides] [paper presentation slides]
Questions of the week
Homework for next week: Write a feature ranking filter.
This lecture will present the support vector machines (SVMs) for classification and regression. SVMs will then be put in the more general framework of regularized risk optimization. The lecture will be followed of the discussion of two papers, which successfully used SVMs in the feature selection challenge, in combination with filter methods.

Week 8: Wrappers [slides]
Questions of the week
Homework for next week: Make an entry for the Dexter dataset.
This lecture will present wrappers for feature selection. Wrappers are methods that use a learning machine to assess the usefulness of subsets of features. A search method is used to explore feature space since for a large number of features it is infeasible to try all possible feature subsets. We will review search strategies, including forward, and backward selection, floating search, simulated annealing and genetic algorithms. One difficult problem with wrappers is to avoid overfitting: by searching heavily feature space it is easy to find one subset of features that will perform well on a limited number of examples. We will compare the search methods for their relative computational and statistical complexity to determine which ones are the most promising. The class will be complemented by a review of cross-validation techniques, which are necessary to master when using wrapper methods.

Week 9: Embedded methods -- lecture [slides]
Questions of the week
This lecture will introduce algorithms that select features in the process of learning, including gradient based feature scaling; nested subset methods; use of performance bounds; direct objective optimization; decision trees.

Week 10: Embedded methods -- paper presentations [lecture slides][Paper A slides][Paper B slides][Paper B handouts]
Questions of the week
Homework for next week: Make an entry for Madelon.
After a brief review of past lectures, two student will present papers of interest on embedded methods
. The exercise class will be devoted to providing tips on how to effectively protect intellecual property and write the claims of a patent.

Week 11: Information theoretic methods [slides]
Questions of the week
Homework for next week: Make an entry for Madelon.
This lecture will review techniques stemming from the information theoretic framework (mutual information, information bottleneck.) By using information theory, variable selection and feature construction can be viewed as coding and distortion problems.


Week 12: Bayesian and ensemble methods [lecture slides][Paper presentation slides]
Questions of the week
Homework for next week: Make an entry for Arcene.
Methods involving voting over several feature subsets. Bayesian approaches to feature selection, bagging/stability, boosting methods will be reviewed.
A paper on feature construction will be presented.

Week 13: Bayesian and ensemble methods [slides paper A][slides paper B]
Homework for next week: Make and entry for Arcene, Dexter, Gisette, Madelon. Work on poster. [slides on tips to make your poster]
Papers will be presented by students.

Week 14: Wrap up
[slides]
Exam questions: You will be asked 9 of those questions at the exam.
Template of poster: You will need to present your challenge results as a poster.
Last homework: Make an entry for all datasets to the challenge website. A baseline method for Dorothea is provided.
The take-home message of the class will be delivered.


Homework

Homework 1 (for November 1): Learn about learning objects.

1) Download the 5 datasets of the feature selection challenge. Put them in <my_root>/<data_dir>.
2) Download the Matlab package CLOP. Put it in <my_root>/<code_dir>.
3) Linux users: build LibSVM in <my_root>/<code_dir>/challenge_objects/packages/libsvm-mat-2.8-1.
4) Download the sample code. Put it in <my_root>. At the Matlab prompt, type > main;
5) View the examples in <my_root>/model_examples.m. Create a new model by chaining learning objects (see FAQ for help) and try it.
6) Write your own preprocessing learning object, imitating the examples in <my_root>/<code_dir>/challenge_objects/prepro.
Suggestions:
- Performing a simple functional transformation of all the data matrix elements like sqrt, or a power law;
- Adding more features that are products of the original features;
- Binarizing the features;
- Replacing the feature values by their rank in the corresponding sorted data column. Note that this poses some difficulty to apply this transformation to the validation and test sets. Propose solutions.
7) Email your preprocessing learning object to: guyoni@inf.ethz.ch with subject "Homework1" no later than Tuesday November 1st.

Homework 2 (for November 8): Learn to make an entry into the feature selection challenge
.
1) Download the datasets. They are available in ASCII format (see homework 1) or in Matlab format. Download the latest version of CLOP (see homework 1).
Linux users: build LibSVM (see homework 1). If you encouter problems with gcc, see the bug fix:
http://www.mathworks.com/support/solutions/data/1-QBCS1.html?solution=1-QBCS1
(thanks to Michael Wild for pointing that out.)
2) Download the sample code of the week from
    http://clopinet.com/isabelle/Projects/ETH/homework2.zip
    Run the sample code main.m.
3) Modify the baseline model (try to obtain better performances or fewer features)
4) Optionally, play with the data visualization routines (see the README file.)
5) Email the result zip file of the results to guyoni@inf.ethz.ch with subject "Homework 2" no later than:
    Tuesday November 8th.

Homework 3 (for November 15): Learn to make an entry into the feature selection challenge (continued).
1) Same data and software as homework 2.
2) Create a heatmap of the 100 top ranking features you selected.
    [YS,idx_pat]=sort(D.train.Y); % Sort the patterns to group them by class
    cmat_display([D.train.X(idx_pat,idx_feat(1:100)), 500*(1+YS(:,ones(10,1)))]);

3) Make a scatter plot of the 3 top ranking features you selected.
    figure; scatterplot(D.train.X(:,idx_feat(1:3)), D.train.Y);
4) Email the result zip file with the figures to guyoni@inf.ethz.ch with subject "Homework3" no later than:
    Tuesday November 15th.

Homework 4 (for November 22): Write your own feature construction object.
1) Download the software for homework 4.
2) Try the examples in the README file.
3) Write your own feature construction object, inspired by the examples. Suggestions:
- a new smoothing kernel that is exponential (instead of Gaussian)
- a Mexican hat kernel.
- skeletonization with kernel convolution.
- a cosine or Hadamard transform.
- a filter bank built from kmeans clustering.
4) Email your preprocessing object to guyoni@inf.ethz.ch with subject "Homework 4" no later than:
            Tuesday November 22th.
5) Optional: Chain the preprocessing you wrote with a learning object of your choice. Train it using Gisette training data.

Homework 5 (for November 29): Make an entry for the Gisette dataset.
1) Download the software for homework 5.
2) Chain the preprocessing you wrote in homework 4 with a learning object of your choice. Train it using Gisette training data.
You can make a submission to the website of the challenge to get your test set score:
http://www.nipsfsc.ecs.soton.ac.uk/
3) Email the result zip file of the results to guyoni@inf.ethz.ch with subject "Homework5" no later than:
    Tuesday November 29th.
Reminder: you earn one point if you make a Gisette submission regardless of your score; you earn 2 points if you outperform the challenge winners.

Homework 6 (for December 6): Write a proposal for solving a problem involving feature extraction.
1) This exercise is about writing a proposal. You will not be required to perform actually what you propose. Special care should be given to explain the assessment method chosen.
2) Tips:
- Use something you have started in a previous homework on which you could elaborate.
- Write an outline.
- Explain things in simple words, using examples. Avoid unnecessary words.
More tips are found in the slides of Lecture 6.
3) Email your proposal to guyoni@inf.ethz.ch with subject "Homework 6" no later than:
    Tuesday December 6th.

Homework 7 (for December 13): Write a feature ranking filter.
1) Download the software for homework 7.
2) Inspiring yourself by the examples, write a new feature ranking filter object. Choose one in Chapter 3 or invent your own.
3) Provide the pvalue and FDR (using a tabulated distribution or the probe method).
4) Email a zip file your object and a plot of the FDR to guyoni@inf.ethz.ch with subject "Homework7" no later than:
   Tuesday December 13th.

Homework 8 (for December 20): Make an entry for Dexter.
1) Download the software for homework 7.
2) Using the method you implemented for homework 7 or another method, try to outperform the baseline method on the Dexter dataset.
3) Email a zip file your results to guyoni@inf.ethz.ch with subject "Homework8" no later than:
   Tuesday December 20th.

Homework 9: no new homework! Catch up, make an entry for Dexter.
1) Dowload the latest version of Clop.
2) Tips to get better performance on Dexter:
- All the filters Ttest, Pearson, Ftest, aucfs (implemented in the correction of homework 8), perform similarly as s2n.
- Relief gives worse results.
- Stick to the baseline model (see homework 7).
- Use both training and validation set for training.
- Vary the number of features and select the best model by cross-validation.

Homework 10: Make an entry for Madelon.
1) Download the software for homework 10.
2) Using the Relief filter or another method, make a competitive entry for Madelon.
3) Optionally: Write the claims of a patent protecting one of the algorithms you have used or implemented.
4) Email a zip file your results and patent claims to guyoni@inf.ethz.ch with subject "Homework10" no later than:
   Tuesday January 17th.

Homework 11: Make an entry for Madelon. (continued)
1) Same as homework 10. Additional tip: vary the number of features and optimize by CV.
2) Optionally: Write a review of one chapter/paper of the book (login: fextract, password:ws0506), that you have read. Use the review form provided.
3) Email a zip file your results and review form to guyoni@inf.ethz.ch with subject "Homework11" no later than:
   Tuesday January 24th.

Homework 12: Make an entry for Arcene.
1) Download the software for homework 12
2) Try to improve on the baseline method. Play with ensemble methods (e.g. implement Adaboost).
3) Email a zip file your results and review form to guyoni@inf.ethz.ch with subject "Homework12" no later than:
   Tuesday January 31st.

Homework 13: Baseline methods and poster.
1) Download the package for homework 13 (includes software and template poster).
2) Modify the poster to include you own results.
3) Make an entry fo Arcene, Dexter, Gisette, and Madelon to the website http://www.nipsfsc.ecs.soton.ac.uk/
4) Email the text of the poster to guyoni@inf.ethz.ch with subject "Homework13" no later than:
   Tuesday February 7th.

Homework 14: Kit to make your challenge entries and prepare a good poster.
1) Download the latest version of CLOP.
2) Download the package for homework 14.
3) Try to outperform the baseline method for Dorothea (Naive Bayes algorithm documentation).
4) Make an entry for all dataset to the website http://www.nipsfsc.ecs.soton.ac.uk/
5) Email the URL of the result and the zip file to guyoni@inf.ethz.ch with subject "Homework14" no later than:
    Tuesday February 28th.