Machine
Learning
is
the science of building hardware or software that can achieve tasks by
learning from examples. The examples often come as {input, output}
pairs. Given new inputs a trained machine can make predictions of the
unknown output.
Examples
of machine learning tasks include:
We
organize challenges to stimulate research in this field. The web
sites of past
challenges remain open for post-challenge submission as even-going
benchmarks.
Feature selection (NIPS 2003): Seventy
five participants competed on five classification problems to make best
predictions and select the smallest possible subset of relevant
input variables (features). The tasks include: cancer diagnosis
from mass-spectrometry data, handwritten digit recognition, text
classification, and drug discovery. |
Book
edited
(CD with data and code) Matlab
software and course material |
Performance prediction (WCCI 2006): One
hundred and forty-five five participants competed on five
classification
problems to make best predictions and predict their generalization
performance on new unseen data. The tasks include: marketing, drug
discovery, text
classification, handwritten digit recognition, and ecology. |
|
Agnostic learning vs. prior knowledge (NIPS 2006 and IJCNN
2007): This
challenge has two tracks: the agnostic learning track and the prior
knowledge track, corresponding to two versions of five datasets.
The “agnostic track” version of the data is ready-to-use data
preprocessed
in a feature-based representation suitable for off-the-shelf machine
learning packages. The identity of the features is not revealed. The
“prior knowledge track” version of the data is just raw data, not
always in a feature representation, coming with information about the
nature and source of the data. Can you do better with the raw data and
prior knowledge about the task? How far can you get with pure “black
box learning”? |
|
Learning causal dependencies (WCCI 2008 and NIPS 2008): What
affects your health? What affects the economy? What affects climate
changes?
and… which actions will have beneficial effects? This series of
competitions
challenge the participants to discover the causes of given effects,
based
on observational data. The datasets include re-simulation data from
models
closely resembling real systems and real data for which the causal
dependencies
are known from experimental evidence. |
NIPS2008
workshop
page |
Fast scoring
in a large database (KDD cup 2009): Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offered the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling). |
KDD cup 2009 workshop
page Results JMLR W&CP proceedings vol 7 |
Active Learning
Challenge (AISTATS 2010 and WCCI 2010): Labeling data is expensive, but large amounts of unlabeled data are available at low cost. Such problems might be tackled from different angles: learning from unlabeled data or active learning. In the former case, the algorithms must satisfy themselves with the limited amount of labeled data and capitalize on the unlabeled data with semi-supervised learning methods. In the latter case, the algorithms may place a limited number of queries to get labels. The goal in that case is to optimize the queries to label data and the problem is referred to as active learning. |
Challenge
website AISTATS 2010 workshop WCCI 2010 workshop |
We are very grateful to our sponsors:
This project is supported by the National Science Foundation under
Grants N0. ECCS-0424142,
ECCS-0736687
and ECCS-0725746.
Any opinions, findings, and conclusions or recommendations expressed in
this material are those of the authors and do not necessarily reflect
the
views of the National Science Foundation.