Some Baseline Methods for the Active Learning Challenge

Some Baseline Methods for the Active Learning Challenge

Gavin Cawley
University of East Anglia, UK

In many potential applications of machine learning, unlabelled data are abundently available
at low cost, but there is a paucity of labelled data, and labelling unlabelled examples
is expensive and/or time-consuming. This motivates the development of active learning
methods, that seek to direct the collection of labelled examples such that the greatest performance
gains can be achieved using the smallest quantity of labelled data. In this paper,
we describe some simple pool-based active learning strategies, based on optimally regularised
linear [kernel] ridge regression, providing a set of baseline submissions for the Active
Learning Challenge. A simple random strategy, where unlabelled patterns are submitted
to the oracle purely at random, is found to be surprisingly effective, being competitive with
more complex approaches.

[back to the workshop on Active Learning and Experimental Design]