Journal of Machine Learning Research
Special Issue on

Variable and Feature Selection

Guest Editors:

Isabelle Guyon and André Elisseeff


Submission deadline: May 15, 2002

The Journal of Machine Learning Research invites authors to submit papers for the Special Issue on Variable and Feature Selection. This special issue follows the NIPS 2001 workshop on the same topic, but is open also to contributions that were not presented in it.

Background

Variable selection refers to the problem of selecting input variables that are most predictive of a given outcome. Variable selection problems are found in all machine learning tasks, supervised or unsupervised, classification, regression, time series prediction, two-class or multi-class, posing various levels of challenges. Feature selection refers to the selection of an
optimum subset of features derived from these input variables. Thus variable selection is distinct from feature selection if the predictor (classifier, regression machine, etc.) operates in a feature space that is not the input space. However, in the rest of this text we sometimes use interchangeably the terms variable and feature.

In the recent years, variable selection has become the focus of a lot of research in several areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing, particularly in application to Internet documents, and Genomics, particularly gene expression array data. The objective of variable selection is two-fold: improving the prediction performance of the predictors and providing a better understanding of the underlying concept that generated the data.

The definition of the mathematical statement of the problem is not widely agreed upon and may depend on the application. One typically distinguishes:  (i) the problem of discovering all the variables relevant to the concept (and determine how relevant they are and how related to one another) from (ii) the problem of finding a minimum subset of variables (or alternative subsets) that are useful to the predictor (i.e. provide good generalization). But there are many variants of the statement. For some applications, intermediate products such as variable ranking, variable subset ranking, and search trees are particularly important. These intermediate products may be combined with other selection criteria from independent data sources. They may also allow the user to easily explore the tradeoff between inducer performance and feature set compactness. Determining an optimum  number of features is then a separate model selection problem. The nomenclature of approaches to the problem is not well established either. Methods assessing the quality of feature subsets according to the prediction error of an predictor are called wrapper methods. Those using criteria such as correlation coefficients that do not involve the inducer are called filter methods. But in reality there is a whole range of methods, including methods that embed feature selection in the learning algorithms. Other distinctions can be made according to whether the feature selection is supervised or unsupervised, whether the inducer is multivariate or univariate.

Variable/feature selection problems are related to the problems of input dimensionality reduction and of parameter pruning. All these problems revolve around the capacity control of the predictor and are instances of the model selection problem. However, variable selection has practical and theoretical challenges of its own. From the practical point of view, eliminating variables may reduce the cost of producing the predictor and increase its speed, while space dimensionality reduction does not necessarily address these problems. Selection methods involving exhaustive search being impractical for large numbers of variables, the elaboration of efficient and effective search procedures is crucial. Some methods yield feature subsets that are highly unstable under minor perturbations (elimination of some training examples or some variable, or introduction of noise on the variables), which may be undesirable for interpreting the results. From the theoretical point of view, the model selection problem of feature selection is notoriously hard. Even harder is the simultaneous selection of the features and the learning machine, or, in the case of unsupervised learning, the simultaneous selection of the features and the number of clusters. There is experimental evidence that greedy methods work better than exhaustive search but a learning theoretic analysis of the underlying regularization mechanisms remains to be done. Other theoretical challenges include estimating with what confidence one can state that a feature is relevant to the concept when it is useful to the predictor and providing a theoretical understanding of the stability of selected feature subsets. Finally rating the variable/feature selection methods also poses challenges.

Topics

The aim of the special issue is to solicit and publish papers that provide a clear view of the state of the art in variable and feature selection. We therefore encourage submissions in the following areas: To facilitate cross-paper comparison and thus strengthen the special issue authors are encouraged but not obliged to share datasets and code. Submitted datasets and code can be sent to the editors at any time and will be posted on the NIPS workshop web page. We will ask people who submit such datasets or code to sign a release freeing us from liability.

We emphasize that authors will not be solely judged in terms of raw performance and this is not to be considered as a competition: insight into the strengths and weaknesses of a given system is also deemed to be important. For papers presenting applications, benchmarks or comparisons on toy problems, an assessment of the statistical significance of the results is expected. In consideration of the fact the variable and feature selection is a multifaceted problem, the authors are expected to properly define the mathematical statement of the problem they are trying to solve and the purpose of the problem they want to solve. In particular, it should be made clear whether the purpose is to discover variables relevant to the concept or determine a subset of variables most useful to the predictor. In cases when this distinction is unclear, the authors are expected to provide a discussion of this issue.

Instructions to the authors

Articles should be submitted electronically. Postcript or PDF format are acceptable and submissions should be single column and typeset in 11 pt font format, and include all author contact information on the first page. We limit the length of the articles submitted to this special issue to 40 pages, including figures and tables. See the author instructions at www.jmlr.org for more details. Articles may be accompanied by online appendices containing data, demonstrations, instructions for obtaining source code, or the source code itself if appropriate.

To submit a paper send the normal emails asked for by the JMLR in their information for authors to submissions@jmlr.org (not to the editors directly), indicating in the subject headers that the submission is intended for the Special Issue on Variable and Feature Selection.

The editors recommend to the authors to conform to the same definition of some commonly used terms:
variable: input variable
feature: quantity derived from input variables
filter: selection method used as a preprocessing that does not attempt to optimize directly the predictor performance
wrapper: selection method optimizing directly the predictor performance
concept: target function/density having generated the training data
predictor: learning machine

The authors are also invited to conform to standard notations:
X a sample of input patterns (use capital italic style to designate sets)
F feature space
Y a sample of output labels
ln logarithm to base e
log2 logarithm to base 2
x.x'  inner product between vectors x and x'
||.|| Euclidean norm
n number of input variables
N number of features (if different from number of input variables)
m number of training examples
xk input vector, k=1...m
fk feature vector, k=1...m
xk,i  input vector elements, i=1...n
fk,i feature vector elements, i=1...n
yi target values, or (in pattern recognition) classes
w input weight vector or feature weight vector
wi weight vector elements, i=1...n or i=1...N
b constant offset (or threshold)
h VC dimension
F a concept space
f(.) a concept or target function
G a predictor space
g(.) a predictor function function (real valued of with values in {-1,1}for classification)
s(.) a non linear squashing function (e.g. sigmoid)
rf(x, y) margin function equal to y f(x)
l(x; y; f(x)) loss function
R(g) risk of g, i.e. expected fraction of errors
Remp(g) empirical risk of g, i.e. fraction of training errors
R(f) risk of f
Remp(f) empirical risk of f
k(x, x') Mercer kernel function (real valued)
A a matrix (use capital letters for matrices)
K matrix of kernel function values
ak Lagrange multiplier or pattern weights, k=1...m
a vector of all Lagrange multipliers
xi slack variables
x vector of all slack variables
C regularization constant for SV Machines

Tips for a successful paper:
- The abstract should state: background, method, results, conclusions.
- The introduction should not paraphrase the abstract but rather develop the background and motivate the method.
- The conclusion should summarize the results, the main advantages and disadvantages, contrast with other methods, and propose further directions. There should always be a conclusion.
- An algorithm is often best described by pseudo-code or an organigram.
- Avoid adding too many details in the text. Create appendices for algorithmic details and derivations. Create a discussion Section for
alternative ways, side remarks, open questions, connections to other methods.
- Avoid redundancy. Favor conciseness and precision and refer to a  technical memorandum or a web site for more details.
- Show on the same page Figures or Tables that have to be compared.
- In general be "nice" to the reader: be as clear as possible.

Instructions to the reviewers

Download the following instructions as an easy to complete Excel spreadsheet or as text.
 
A Summary Summarize briefly the contents of the paper:
B Questions Provide answers in text and grades on a scale 0 to 2 (0=worst, 2=best)
1 Scope Is the paper relevant to feature/variable selection? (0 1 2)
2 Novelty Does the material constitute a novel unobvious contribution to the field or if it is tutorial in nature, does it review the field appropriately? (0 1 2)
3 Usefulness Are the methods, theories, and/or conclusions particularly useful (usefulness should be well supported by results)? (0 1 2)
4 Sanity Is the paper technically sound (good methodology, correct proofs, accurate and sufficient result analysis)? (0 1 2)
5 Quantity Does the paper contain enough interesting material? (0 1 2)
6 Reproducibility Are the methods introduced or considered sufficiently described to be implemented and/or to reproduce the results? (0 1 2)
7 Demonstration Has the efficiency, advantages, and/or drawbacks of the methods introduced or considered been sufficiently and convincingly demonstrated theoretically and/or experimentally? (0 1 2)
8 Comparison Has a sufficient method comparison been performed? (0 1 2)
9 Completeness Is the paper self contained, rather than referring to other publications extensively? (0 1 2)
10 Take-aways Does the paper clearly state its objectives (in the title, abstract, and introduction) and delivers them (in the abstract, body of the text, and conclusion)? (0 1 2)
11 Bibliography Is the background properly described in the introduction and/or discussion, with an adequate bibliography? (0 1 2)
12 Outlook Are the results critically analyzed and further research directions outlined in a discussion or conclusion section? (0 1 2)
13 Data availability Are the data made available to other researchers? (0 1 2)
14 Code availability Is the implementation made available to other researchers? (0 1 2)
15 Readability Is the paper easily readable for machine learning experts interested in feature/variable selection? (0 1 2)
16 Notations Do the authors comply with the guidelines found in http://www.clopinet.com/isabelle/Projects/NIPS2001/call-for-papers.html particularly with respect to the notations? (0 1 2)
17 Figures Is the paper well and sufficiently illustrated by figures?
18 Formalism Are the methods clearly formalized by a step by step procedure (e.g. algorithm pseudo-code or flow charts provided)? (0 1 2)
19 Density Is the length appropriate, relative to the contents? (0 1 2)
20 Language Is the English satisfactory? (0 1 2)
C Comments Add other detailled comments, corrections and suggestions:

Anticipated Schedule

Submission deadline: May 15 2002

Notification of acceptance: July 30 2002

Final drafts: September 30 2002

The schedule may be subject to revisions. Prospective authors are invited to make themselves known to the editors ahead of time to facilitate the harmonization of the issue and ensure that the authors will be informed of any change.

Further information

Please contact Isabelle Guyon with any queries.