ICML 2011 workshop on unsupervised and transfer learning

Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism

Si-Chi Chin and W. Nick Street
Interdisciplinary Graduate Program in Informatics (IGPI),
The University of Iowa

The paper applies knowledge transfer methods to the problem of detecting Wikipedia vandalism detection, de ned as malicious editing intended to compromise the integrity of the content of articles. A major challenge of detecting Wikipedia vandalism is the lack of large amount of labelled training data. Knowledge transfer addresses this challenge by leveraging previously acquired knowledge from a source task. However, the characteristics of Wikipedia vandalism are heterogeneous, ranging from a small replacement of a letter to a massive deletion of text. Avoiding negative transfer becomes a primary concern given this heterogeneous nature. The paper explores knowledge transfer methods to generalize learned models from a heterogeneous dataset to a more uniform dataset while avoiding negative transfer. The proposed two segmented transfer (ST) approaches map unlabeled data from the target task to the most related cluster from the source task, classifying the unlabeled data using the most relevant learned models.