ICML 2011 workshop on unsupervised and transfer learning

Information Theoretic Model Validation by Approximate Optimization
Joachim M. Buhmann
Department of Computer Science, ETH Zurich

Model selection in pattern recognition requires (i) to specify a suitable cost function for the data interpretation and (ii) to control the degrees of freedom depending on the noise level in the data. We advocate an information theoretic perspective where the uncertainty in the measurements quantizes the solution space of the underlying optimization problem, thereby adaptively regularizing the cost function. A pattern recognition model, which can tolerate a higher level of fluctuations in the measurements than alternative models, is considered to be superior provided that the solution is equally informative. The optimal tradeoff between "informativeness" and "robustness" is quantified by the approximation capacity of the selected cost function.
Empirical evidence for this model selection concept is provided by cluster validation in computer security, i.e., multilabel clustering of Boolean data for role based access control, but also in high dimensional Gaussian mixture models and the analysis of microarray data. Furthermore, the approximation capacity of the SVD cost function suggests an optimal cutoff value for the SVD spectrum.