F(

- Kernel basis functions.

3. What is the relation between ensemble methods and Bayesian learning?

In the Bayesian framework our model is an attempt to explain the data that can be "marginalized out": P(y|

MCMC stands for Markov Chain Monte Carlo. They are methods for "sampling" probability distributions. the can be used to train ensembles in the Bayesian sense by sampling the posterior distribution P(f|D). In this way, we get an empirical estimate of the weighted majority.

Bagging is a bootstrap method applied to learning ensemble of classifiers. Each base learner is train with a resampled training set. A resampled training set is obtained by sampling with replacement m examples in a training set of size m. All the base learners vote with the same weight.

The out-of-bag error estimate is obtained by computing for each base learner the errors made on the examples not used for training (out-of-bag examples), then averaging the results for all base learners. The idea is the permute randomly the values of one feature in the out-of-bag examples and compute the difference in error rate between the unperturbed and perturbed data. This difference may be used as a ranking index for features. If it is normalized by its standard error, it is approaximatly distributed with the Normal Law, which provides a means of computing a pvalue.

Random Forests are ensembles of tree classifiers. each base learner is trained from a bootstrap sample. Additional variability is introduced by splitting nodes in the tree on the basis of on a random subset of the original features, not all the features. Typically, if there is a total of N features, one uses sqrt(N) features.

At each node, the feature selected to split the data is the one providing the largest information gain, i.e. having largest mutual information with the target. An index is obtained for each feature by adding the information gains it did or would provide at each node split. An average index can be obtained by averaging the indices obtained from several trees in a forest.

Boosting is a method of training an ensemble method by adding base learners in sequence. Each newly added base learner is trained with a higher proportion of the examples that were hard to learn up to then.

One can use decision stumps as base learners. A decision stump is a classifier built from one variable (like the root node of a tree classifier). Boosting with decision stumps is a form of forward selection of variables.

- Averaging ranking indices to get a new global index

- Averaging ranks to get a new global index

- Intersecting feature subsets (soft intersections may be defined as setting a threshold on the number of times a feature happens in all subsets considered)

- Computing the "centroid" of feature subsets (the subset intersecting most with all others)

- Computing the "centroid" of feature rankings (the ranking closest to all other rankings)