Ensemble of ensemble of tree and neural network.
Louis Duclos-Gosselin


The 2007 Agnostic Learning v.s. Prior Knowledge Challenge permits me to illustrate one of my personal algorithms on different datasets. With this kind of algorithms I did a great score on PAKDD 2007 (30th). I propose to use a special case of mixed ensemble of boosting tree and neural network. In brief, a single tree is used to adjust the setting of my boosting tree and neural network. First, I propose to use a combination of Gini, Entropy and Misclassification algorithms to construct a single tree. This single tree, in conjunction to Genetic algorithm permit to set parameters for ensemble method (Category weights, Misclassification costs, Variable weights, Max. categories for continuous predictors, Minimum size node to split, Use surrogate splitters for missing, Tree pruning and validation method, Tree pruning criterion). Second, Genetic algorithms, Wrapper techniques, Link analysis, SOM, clustering technique and filter techniques allow me to chose the best predictors for ensemble methods. Third, a Special case of Gradient-boosting is constructed with the single tree’s setting. In addition, annealing techniques are used to choose the best neural network architecture (S.V.M., R.B.F., Bayes networks, Cascade correlation, Projection pursuit). The parameters of those neural networks are set with Genetic algorithm (Learning algorithm and parameter, Number of neuron and hidden layer, activation function). Finally, the ensemble method is constructed. This is the important part of the process. In function of the goal of managers (classification goal or ranking goal) a minimisation criteria is choose and various techniques are used to aggregate the ensemble of tree and the neural network. In conclusion, there are many facts which are interesting with this kind of algorithm: it doesn’t over fitting because k-folds-validation and genetic algorithms are used during all the process to keep the over learning as low as possible and this process is particular powerful on small category problem.

Keywords:
-    Preprocessing or feature construction: Optimal binning, Standardize, Maximize normality
-    Feature selection approach: Filter, Wrapper, Link analysis, SOM, Clustering technique
-    Feature selection engine: Relief, Information theory, Mutual information, X2, Single tree
-    Feature selection search: Annealing, Genetic algorithm
-    Feature selection criterion: K-fold cross-validation
-    Classifier: Neural networks, Tree classifier, Ensemble of tree, S.V.M., R.B.F., Bayes networks, Cascade correlation, Projection pursuit
-    Hyper-parameter selection: grid-search, pattern search, cross-validation, K-fold, Genetic algorithm.