Ensemble of ensemble of tree
and neural network.
Louis Duclos-Gosselin
The 2007 Agnostic Learning v.s. Prior Knowledge Challenge permits me to illustrate
one of my personal algorithms on different datasets. With this kind of algorithms
I did a great score on PAKDD 2007 (30th). I propose to use a special case
of mixed ensemble of boosting tree and neural network. In brief, a single
tree is used to adjust the setting of my boosting tree and neural network.
First, I propose to use a combination of Gini, Entropy and Misclassification
algorithms to construct a single tree. This single tree, in conjunction to
Genetic algorithm permit to set parameters for ensemble method (Category
weights, Misclassification costs, Variable weights, Max. categories for continuous
predictors, Minimum size node to split, Use surrogate splitters for missing,
Tree pruning and validation method, Tree pruning criterion). Second, Genetic
algorithms, Wrapper techniques, Link analysis, SOM, clustering technique
and filter techniques allow me to chose the best predictors for ensemble
methods. Third, a Special case of Gradient-boosting is constructed with the
single tree’s setting. In addition, annealing techniques are used to choose
the best neural network architecture (S.V.M., R.B.F., Bayes networks, Cascade
correlation, Projection pursuit). The parameters of those neural networks
are set with Genetic algorithm (Learning algorithm and parameter, Number
of neuron and hidden layer, activation function). Finally, the ensemble method
is constructed. This is the important part of the process. In function of
the goal of managers (classification goal or ranking goal) a minimisation
criteria is choose and various techniques are used to aggregate the ensemble
of tree and the neural network. In conclusion, there are many facts which
are interesting with this kind of algorithm: it doesn’t over fitting because
k-folds-validation and genetic algorithms are used during all the process
to keep the over learning as low as possible and this process is particular
powerful on small category problem.
Keywords:
- Preprocessing or feature construction: Optimal binning,
Standardize, Maximize normality
- Feature selection approach: Filter, Wrapper, Link analysis,
SOM, Clustering technique
- Feature selection engine: Relief, Information theory,
Mutual information, X2, Single tree
- Feature selection search: Annealing, Genetic algorithm
- Feature selection criterion: K-fold cross-validation
- Classifier: Neural networks, Tree classifier, Ensemble
of tree, S.V.M., R.B.F., Bayes networks, Cascade correlation, Projection
pursuit
- Hyper-parameter selection: grid-search, pattern search,
cross-validation, K-fold, Genetic algorithm.