Data Grid Models in the Agnostic Learning vs. Prior Knowledge Challenge
Marc Boullé, France Telecom R&D, marc.boulle@orange-ftgroup.com
We introduce a new method 1 to automatically, rapidly and reliably evaluating
the predictive information of any subset of variables in supervised learning.
It is based on a partitioning of each input variable, in intervals in the
numerical case and in groups of values in the categorical case. The cross-product
of the univariate partitions forms a multivariate partition of the input
representation space into a set of cells. This multivariate partition, called
data grid, allows to evaluate the correlation between the input variables
and the output variable. The best data grid is searched owing to a Bayesian
model selection approach and to combinatorial algorithms. Three classification
techniques exploiting data grids differently are presented and evaluated
in the Agnostic Learning vs. Prior Knowledge Challenge. These preliminary
experiments demonstrate the interest of using data grid in machine learning
tasks.