IJCNN 2007 2-hour tutorial:
Feature
selection and causal discovery
fundamentals and applications
|
Isabelle Guyon
August 12, 2007 Renaissance Orlando Resort, Florida
Variable and feature selection
have become the focus of much research in areas of application for which
datasets with tens or hundreds of thousands of variables are available. These
areas include text processing of internet documents, gene expression array
analysis, and combinatorial chemistry. The objective of variable selection
is three-fold: improving the prediction performance of the predictors, providing
faster and more cost-effective predictors, and providing a better understanding
of the underlying process that generated the data. This tutorial will cover
a wide range of aspects of such problems: providing a better definition of
the objective function, feature construction, feature ranking, multivariate
feature selection, efficient search methods, and feature validity assessment
methods.
Most
feature selection methods do not attempt to uncover causal relationships
between feature and target and focus instead on making best predictions.
We will examine situations in which the knowledge of causal relationships
benefits feature selection. Such benefits may include: explaining relevance
in terms of causal mechanisms, distinguishing between actual features and
experimental artifacts, predicting the consequences of actions performed
by external agents, and making predictions in non-stationary environments.
The objective of this tutorial is to provide a theoretical framework
and practical tools to address the problem of feature selection in a variety
of situations:
- when the number of samples and/or features vary across
several orders of magnitude,
- when the feature or the target are binary, categorical,
or continuous,
- when the features are sparse or dense
- when the classes are balanced or imbalanced
- when the features are independent or correlated
- when the data are clean or plagued with noise or
experimental artifacts
- when the data are i.i.d. or when there are changes
in distributions, eventually resulting from interventions on the system
by external agents.
A variety
of algorithms will be reviewed and contrasted. Particular attention will
be given to feature validity assessment methods, via statistical testing
and/or proper use of cross-validation.
|
Audience
This tutorial
will be accessible to a broad audience, including practitioners, researchers,
and students, who want to catch up with the latest developments in feature
selection. Some previous exposure to machine learning/data mining problems
is preferable, but not necessary.
Links and resources
Contact information
Lecturer:
Isabelle Guyon
Clopinet Enterprises
955, Creston Road,
Berkeley, CA 94708, U.S.A.
Tel/Fax: (510) 524 6211
We are grateful to Health
Discovery Corporation for sponsoring this event.
|