CVPR 2011 workshop on gesture recognition

A tutorial on deep and unsupervised feature learning for activity recognition
Graham Taylor
New York University, USA

Recognition of human activity from video data is a challenging problem that has received an increasing amount of attention from the computer vision community in recent years. Currently the best performing methods at this task are based on engineered descriptors with explicit local geometric cues and other heuristics. Until very recently, learning has not played a major role until the classification stage, at which point much of the input is lost. It has been shown that learning features in a supervised, unsupervised, or semi-supervised setting can improve performance in other vision tasks, but most of these works have concentrated on static images rather than video. In this tutorial, we will review a number of recently proposed methods that attempt to learn low and mid-level features for use in activity recognition. This includes deep and unsupervised feature learning methods such as convolutional networks, convolutional deep belief networks and other approaches which learn sparse, overcomplete representations.