CVPR 2011 workshop on gesture
recognition
A tutorial on deep and unsupervised feature
learning for activity recognition
Graham Taylor
New York
University, USA
Recognition of
human activity from video data is a challenging problem that has
received an increasing amount of attention from the computer vision
community in recent years. Currently the best performing methods at
this task are based on engineered descriptors with explicit local
geometric cues and other heuristics. Until very recently, learning has
not played a major role until the classification stage, at which point
much of the input is lost. It has been shown that learning features in
a supervised, unsupervised, or semi-supervised setting can improve
performance in other vision tasks, but most of these works have
concentrated on static images rather than video. In this tutorial, we
will review a number of recently proposed methods that attempt to learn
low and mid-level features for use in activity recognition. This
includes deep and unsupervised feature learning methods such as
convolutional networks, convolutional deep belief networks and other
approaches which learn sparse, overcomplete representations.