Variable and Feature Selection
Guest Editors:
Submission deadline: May 15,
2002
The Journal of Machine Learning Research invites authors to submit papers for the Special Issue on Variable and Feature Selection. This special issue follows the NIPS 2001 workshop on the same topic, but is open also to contributions that were not presented in it.
In the recent years, variable selection has become the focus of a lot of research in several areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing, particularly in application to Internet documents, and Genomics, particularly gene expression array data. The objective of variable selection is two-fold: improving the prediction performance of the predictors and providing a better understanding of the underlying concept that generated the data.
The definition of the mathematical statement of the problem is not widely agreed upon and may depend on the application. One typically distinguishes: (i) the problem of discovering all the variables relevant to the concept (and determine how relevant they are and how related to one another) from (ii) the problem of finding a minimum subset of variables (or alternative subsets) that are useful to the predictor (i.e. provide good generalization). But there are many variants of the statement. For some applications, intermediate products such as variable ranking, variable subset ranking, and search trees are particularly important. These intermediate products may be combined with other selection criteria from independent data sources. They may also allow the user to easily explore the tradeoff between inducer performance and feature set compactness. Determining an optimum number of features is then a separate model selection problem. The nomenclature of approaches to the problem is not well established either. Methods assessing the quality of feature subsets according to the prediction error of an predictor are called wrapper methods. Those using criteria such as correlation coefficients that do not involve the inducer are called filter methods. But in reality there is a whole range of methods, including methods that embed feature selection in the learning algorithms. Other distinctions can be made according to whether the feature selection is supervised or unsupervised, whether the inducer is multivariate or univariate.
Variable/feature selection problems are related to the problems of input dimensionality reduction and of parameter pruning. All these problems revolve around the capacity control of the predictor and are instances of the model selection problem. However, variable selection has practical and theoretical challenges of its own. From the practical point of view, eliminating variables may reduce the cost of producing the predictor and increase its speed, while space dimensionality reduction does not necessarily address these problems. Selection methods involving exhaustive search being impractical for large numbers of variables, the elaboration of efficient and effective search procedures is crucial. Some methods yield feature subsets that are highly unstable under minor perturbations (elimination of some training examples or some variable, or introduction of noise on the variables), which may be undesirable for interpreting the results. From the theoretical point of view, the model selection problem of feature selection is notoriously hard. Even harder is the simultaneous selection of the features and the learning machine, or, in the case of unsupervised learning, the simultaneous selection of the features and the number of clusters. There is experimental evidence that greedy methods work better than exhaustive search but a learning theoretic analysis of the underlying regularization mechanisms remains to be done. Other theoretical challenges include estimating with what confidence one can state that a feature is relevant to the concept when it is useful to the predictor and providing a theoretical understanding of the stability of selected feature subsets. Finally rating the variable/feature selection methods also poses challenges.
We emphasize that authors will not be solely judged in terms of raw performance and this is not to be considered as a competition: insight into the strengths and weaknesses of a given system is also deemed to be important. For papers presenting applications, benchmarks or comparisons on toy problems, an assessment of the statistical significance of the results is expected. In consideration of the fact the variable and feature selection is a multifaceted problem, the authors are expected to properly define the mathematical statement of the problem they are trying to solve and the purpose of the problem they want to solve. In particular, it should be made clear whether the purpose is to discover variables relevant to the concept or determine a subset of variables most useful to the predictor. In cases when this distinction is unclear, the authors are expected to provide a discussion of this issue.
To submit a paper send the normal emails asked for by the JMLR in their information for authors to submissions@jmlr.org (not to the editors directly), indicating in the subject headers that the submission is intended for the Special Issue on Variable and Feature Selection.
The editors recommend to the authors to conform to the same definition
of some commonly used terms:
variable: input variable
feature: quantity derived from input variables
filter: selection method used as a preprocessing that does not
attempt to optimize directly the predictor performance
wrapper: selection method optimizing directly the predictor
performance
concept: target function/density having generated the training
data
predictor: learning machine
The authors are also invited to conform to standard notations:
X a sample of input patterns (use capital italic style to designate
sets)
F feature space
Y a sample of output labels
ln logarithm to base e
log2 logarithm to base 2
x.x' inner product between vectors x and x'
||.|| Euclidean norm
n number of input variables
N number of features (if different from number of input variables)
m number of training examples
xk input vector, k=1...m
fk feature vector,
k=1...m
xk,i input vector elements, i=1...n
fk,i feature vector elements,
i=1...n
yi target values, or (in pattern recognition) classes
w input weight vector or feature weight vector
wi weight vector elements, i=1...n or i=1...N
b constant offset (or threshold)
h VC dimension
F a concept space
f(.) a concept or target function
G a predictor space
g(.) a predictor function function (real valued of with values in {-1,1}for
classification)
s(.) a non linear squashing function (e.g. sigmoid)
rf(x, y) margin function
equal to y f(x)
l(x; y; f(x)) loss function
R(g) risk of g, i.e. expected fraction of errors
Remp(g) empirical risk of g, i.e. fraction of training errors
R(f) risk of f
Remp(f) empirical risk of f
k(x, x') Mercer kernel function (real valued)
A a matrix (use capital letters for matrices)
K matrix of kernel function values
ak Lagrange multiplier or pattern
weights, k=1...m
a vector of all Lagrange multipliers
xi slack variables
x vector of all slack variables
C regularization constant for SV Machines
Tips for a successful paper:
- The abstract should state: background, method, results, conclusions.
- The introduction should not paraphrase the abstract but rather develop
the background and motivate the method.
- The conclusion should summarize the results, the main advantages
and disadvantages, contrast with other methods, and propose further directions.
There should always be a conclusion.
- An algorithm is often best described by pseudo-code or an organigram.
- Avoid adding too many details in the text. Create appendices for
algorithmic details and derivations. Create a discussion Section for
alternative ways, side remarks, open questions, connections to other
methods.
- Avoid redundancy. Favor conciseness and precision and refer to a
technical memorandum or a web site for more details.
- Show on the same page Figures or Tables that have to be compared.
- In general be "nice" to the reader: be as clear as possible.
A | Summary | Summarize briefly the contents of the paper: |
B | Questions | Provide answers in text and grades on a scale 0 to 2 (0=worst, 2=best) |
1 | Scope | Is the paper relevant to feature/variable selection? (0 1 2) |
2 | Novelty | Does the material constitute a novel unobvious contribution to the field or if it is tutorial in nature, does it review the field appropriately? (0 1 2) |
3 | Usefulness | Are the methods, theories, and/or conclusions particularly useful (usefulness should be well supported by results)? (0 1 2) |
4 | Sanity | Is the paper technically sound (good methodology, correct proofs, accurate and sufficient result analysis)? (0 1 2) |
5 | Quantity | Does the paper contain enough interesting material? (0 1 2) |
6 | Reproducibility | Are the methods introduced or considered sufficiently described to be implemented and/or to reproduce the results? (0 1 2) |
7 | Demonstration | Has the efficiency, advantages, and/or drawbacks of the methods introduced or considered been sufficiently and convincingly demonstrated theoretically and/or experimentally? (0 1 2) |
8 | Comparison | Has a sufficient method comparison been performed? (0 1 2) |
9 | Completeness | Is the paper self contained, rather than referring to other publications extensively? (0 1 2) |
10 | Take-aways | Does the paper clearly state its objectives (in the title, abstract, and introduction) and delivers them (in the abstract, body of the text, and conclusion)? (0 1 2) |
11 | Bibliography | Is the background properly described in the introduction and/or discussion, with an adequate bibliography? (0 1 2) |
12 | Outlook | Are the results critically analyzed and further research directions outlined in a discussion or conclusion section? (0 1 2) |
13 | Data availability | Are the data made available to other researchers? (0 1 2) |
14 | Code availability | Is the implementation made available to other researchers? (0 1 2) |
15 | Readability | Is the paper easily readable for machine learning experts interested in feature/variable selection? (0 1 2) |
16 | Notations | Do the authors comply with the guidelines found in http://www.clopinet.com/isabelle/Projects/NIPS2001/call-for-papers.html particularly with respect to the notations? (0 1 2) |
17 | Figures | Is the paper well and sufficiently illustrated by figures? |
18 | Formalism | Are the methods clearly formalized by a step by step procedure (e.g. algorithm pseudo-code or flow charts provided)? (0 1 2) |
19 | Density | Is the length appropriate, relative to the contents? (0 1 2) |
20 | Language | Is the English satisfactory? (0 1 2) |
C | Comments | Add other detailled comments, corrections and suggestions: |
Notification of acceptance: July 30 2002
Final drafts: September 30 2002
The schedule may be subject to revisions. Prospective authors are invited to make themselves known to the editors ahead of time to facilitate the harmonization of the issue and ensure that the authors will be informed of any change.