Data Grid Models in the Agnostic Learning vs. Prior Knowledge Challenge
Marc Boullé, France Telecom R&D, marc.boulle@orange-ftgroup.com

We introduce a new method 1 to automatically, rapidly and reliably evaluating the predictive information of any subset of variables in supervised learning. It is based on a partitioning of each input variable, in intervals in the numerical case and in groups of values in the categorical case. The cross-product of the univariate partitions forms a multivariate partition of the input representation space into a set of cells. This multivariate partition, called data grid, allows to evaluate the correlation between the input variables and the output variable. The best data grid is searched owing to a Bayesian model selection approach and to combinatorial algorithms. Three classification techniques exploiting data grids differently are presented and evaluated in the Agnostic Learning vs. Prior Knowledge Challenge. These preliminary experiments demonstrate the interest of using data grid in machine learning tasks.