ICML 2011 workshop on unsupervised and transfer learning

ICML 2011 workshop on unsupervised and transfer learning

Clustering: Science or Art?

Ulrike von Luxburg
Max Planck Institute for Intelligent Systems, Tuebingen, Germany
Robert C. Williamson
Australian National University, Australia
Isabelle Guyon
ClopiNet, Berkeley,California

We examine whether the quality of different clustering algorithms can be compared by a general, scientifically sound procedure, which is independent of particular clustering algorithms. We argue that the major obstacle is the difficulty in evaluating a clustering algorithm without taking into account the context: why does the user cluster his data in the first place, and what does he want to do with the clustering afterwards? We argue that clustering should not be treated as an application-independent mathematical problem, but should always be studied in the context of its end-use. Different techniques to evaluate clustering algorithms have to be developed for different uses of clustering. To simplify this procedure we argue that it will be useful to build a "taxonomy of clustering problems" to identify clustering applications which can be treated in a unified way and that such an effort will be more fruitful than attempting the impossible|developing "optimal" domainindependent clustering algorithms or even classifying clustering algorithms in terms of how they work.