We present a common point of view, which provides a unified framework for model assessment in these different areas of machine learning. The main idea is that an algorithm generalizes well, if the solution on one data set has small disagreement with the solution on another data set. This idea is independent of the amount of label information which is supplied to the problem, and the challenge is to define disagreement in a meaningful way. The main emphasis lies on developing model assessment procedures for unsupervised clustering.
Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract "natural" group structure in data. Such groupings need to be validated to separate the signal in the data from spurious structure. In this context, finding an appropriate number of clusters is a particularly important model selection question.
We demonstrate the model selection capabilities of the stability method both for toy examples and real world clustering applications, where we have put particular emphasis on the problem of finding segments in natural images. While stable clusterings do not necessarily correspond to the "desired" segmentation solutions (which probably cannot be expected from any internal model assessment strategy), we demonstrate that the absence of stable solutions is a good indicator for model mismatches.