ICML 2011 workshop on unsupervised
and transfer learning
Divide
and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia
Vandalism
Si-Chi
Chin and W. Nick Street
Interdisciplinary Graduate Program in Informatics (IGPI),
The University of Iowa
The paper applies knowledge transfer methods to the problem of
detecting Wikipedia vandalism detection, dened as malicious editing
intended to compromise the integrity of the content of articles. A
major challenge of detecting Wikipedia vandalism is the lack of large
amount of labelled training data. Knowledge transfer addresses this
challenge by leveraging previously acquired knowledge from a source
task. However, the characteristics of Wikipedia vandalism are
heterogeneous, ranging from a small replacement of a letter to a
massive deletion of text. Avoiding negative transfer becomes a primary
concern given this heterogeneous nature. The paper explores knowledge
transfer methods to generalize learned models from a heterogeneous
dataset to a more uniform dataset while avoiding negative transfer. The
proposed two segmented transfer (ST) approaches map unlabeled data from
the target task to the most related cluster from the source task,
classifying the unlabeled data using the most relevant learned models.