Victor Eruhimov, Vladimir Martyanov, Eugene Tuv

The problem of building a universal tool for data classification provides quite a few challenges for researchers. One of the tools is a serial ensemble of decision trees, where each consecutive tree explains the error of the current tree set (so-called Gradient Boosted Trees, GBT). It also provides a natural and efficient method of calculating the influence of each predictor variable on the response in the given dataset. We propose a fast method for building accurate tree-based models by utilizing decision tree feature weighting algorithm on each step of the greedy algorithm. Each variable is assigned a so-called importance weight so that the higher is the weight the higher are the chances that this variable will be considered as a candidate for a split calculation. The weights are dynamically recalculated on each step with regard to the previous values in order to prevent overweighting of a single variable.

The predictive power of the ensemble considerable depends on the choice of several real-valued training parameters. We propose to choose these parameters with an algorithm based on particle filtering with simulated annealing, optimizing cross-validation error of the classifier.

We consider this tool as a universal method for versatile data analysis.