Stability of bagged decision
trees
Yves Granvalet
IDIAP, Switzerlnad
Bagging is a simple ensemble technique, where an estimator is produced by
averaging predictors fitted to bootstrap samples. Bagged decision trees almost
consistently improve on the original predictor, and it is widely believed
that bagging is effective thanks to the variance reduction stemming from
averaging predictors. We provide here a counter-example, and we give experimental
evidence supporting that bagging stabilizes prediction by equalizing the
influence of training examples. The influence of near-boundary points is
increased when they participate to the definition of the split location of
any node. Highly influential examples, which have a high weight in deciding
the split direction near the root node, are down-weighted due to their absence
in some of the bootstrap samples. Recent analyses relating stability to generalization
error are empirically tested, to see if they account for baggingÕs
success. We quantify hypothesis stability on several benchmarks, and conclude
that the influence equalization process improves significantly the stability,
which in turn may increase the generalization performances. Our experiments
furthermore suggest that the bounds on generalization performances based
on the stability analysis are quite tight for unbagged and bagged decision
trees.