Some Baseline Methods for the
Active Learning Challenge
Gavin Cawley
University of East Anglia, UK
In many potential applications
of machine learning, unlabelled data are abundently available
at low cost, but there is a paucity of labelled data, and labelling
unlabelled examples
is expensive and/or time-consuming. This motivates the development of
active learning
methods, that seek to direct the collection of labelled examples such
that the greatest performance
gains can be achieved using the smallest quantity of labelled data. In
this paper,
we describe some simple pool-based active learning strategies, based on
optimally regularised
linear [kernel] ridge regression, providing a set of baseline
submissions for the Active
Learning Challenge. A simple random strategy, where unlabelled patterns
are submitted
to the oracle purely at random, is found to be surprisingly effective,
being competitive with
more complex approaches.
[back
to
the workshop on Active Learning and Experimental Design]