The workshop was video-recorded by VideoLectures
First call for papers: JMLR Special Topic on Gesture Recognition

attendees   attendees2

interpreter   filming

Invited speakers:
Aleix Martinez Aleix Martinez, Ohio State University, USA. Deciphering the Face. Much progress has been made to understand human cognition. Yet, little is know about face perception. Facial expressions of emotions is a clear example of this limitation. It has been established how the underlying muscles move in response to felt emotions, but little is known on how these constructs are interpreted by the visual system and how to build robust computer vision systems that emulate human perception. In this talk, we will review the recent literature on this subject and proposes a model for the perception of facial expressions of emotion.  [more...]
Greg Mori Greg Mori, Simon Fraser University, Canada. Learning structured models for recognizing human actions. The development of automatic methods for recognizing human actions is a challenging computer vision problem.  Robust solutions to this problem would facilitate a variety of applications from image retrieval to improving safety in assisted living facilities.  In this talk I will present work towards solving this problem via the learning of structured models.  I will describe a model that uses a hidden Conditional Random Field (hCRF) to learn a representation for motion parts in conjunction with whole-body templates.  Second, a variant of this model is used for treating body pose as a latent variable for action recognition. [more...]
Richard Bowden Richard Bowden, Univ. Surrey, UK. From activity to language: learning to recognise the meaning of motion Whether the task is recognising an atomic action of an individual or their implied activity, the continuous multichannel nature of sign language recognition or the appearance of words on the lips, all approaches can be categorised at the most basic level as the learning and recognition of spatio-temporal patterns. The complexity of the problem to some extend dictates the approaches used e.g. for simple gestures and actions, holistic descriptors can work well but are less suitable for the subtitles and complexities of sign. However, [more...]
Graham Taylor
Graham Taylor, NYU, New-York. Tutorial on applications of Deep Learning methods to activity recognition. Recognition of human activity from video data is a challenging problem that has received an increasing amount of attention from the computer vision community in recent years. Currently the best performing methods at this task are based on engineered descriptors with explicit local geometric cues and other heuristics. In this tutorial, we will review a number of recently proposed methods that attempt to learn low and mid-level features for use in activity recognition. This includes deep and unsupervised feature learning methods such as convolutional networks, convolutional deep belief networks and other approaches [more...]
David Forsyth
David Forsyth, University of Illinois at Urbana-Champaign. Understanding human activity. David Forsyth is well known for his work on human motion computing. He and co-authors have demonstrated the first robust, accurate human tracker that can reliably report the configuration of arms and legs and does not need to be started by hand. They showed that motion capture data can be rearranged to produce highly realistic animations of novel human motions, and demonstrated that close control of the nature of the motion was possible using annotations. They then demonstrated that this animation system can be linked to the output of this tracker, to obtain annotations describing human activities automatically from video. 
Sudeep Sarkar
Sudeep Sarkar, University of South Florida. Segmentation-robust representations, matching, modeling for sign language recognition. Distinguishing true signs from transitional, extraneous movements made by the signer as s/he moves from one sign to the next is a serious hurdle in the design of continuous Sign Language recognition systems. This problem is further compounded by the ambiguity of segmentation and occlusions, resulting in propagation of errors to higher levels. This talk will describe our experience with representations and matching methods, particularly those that can handle errors in low-level segmentation methods and uncertainties in segmentation of signs in sentences.  [More...]
Dimitris Metaxas
Dimitris Metaxas, Rutgers University, New Jersey. Sign language and human activity recognition.  Dimitris Metaxas is a Professor II (Distinguished) in the Division of Computer and Information Sciences at Rutgers University where he conducts research on the development of formal methods upon which computer vision, computer graphics and medical imaging can advance synergistically. In particular he is focusing on human body and shape motion analysis, human surveillance, security applications, American Sign Language recognition, behavior modeling and analysis and scalable solutions to large and distributed sensor-based networks.
Maja Pantic
Maja Pantic, Imperial College, London. Facial Behaviour Understanding. The facial behaviour is our preeminent means to communicating affective and social signals. This talk discusses a number of components of human facial behavior, how they can be automatically sensed and analysed by computer, what is the past research in the field conducted by the iBUG group at Imperial College London, and how far we are from enabling computers to understand human facial behavior.  Maja Pantic is a Professor of Affective and Behavioral Computing and leader of the Imperial College of London, working on machine analysis of human non-verbal behavior and its applications to Human Computer Interaction.  [More...]
Christian Vogler
Christian Vogler, Gallaudet University, Washington DC. Advances in Phonetics-based Sub-Unit Modeling for Transcription Alignment and Sign Language Recognition. Christian Vogler is the director of the Technology Access Program. He is a principal investigator within the Rehabilitation Engineering Research Center (RERC) on Telecommunications Access, with a particular focus on the accessibility of web conferencing and telecollaboration systems. In his role at the RERC, he is involved in bringing consumers and industry together on accessibility issues, as well as developing prototype technologies for improving the accessibility of such systems. [More...]
Takeo Kanade
Takeo Kanade, Carnegie Mellon University. Body motion detection and understanding using both 2D and 3D setups. Takeo Kanade is the U. A. and Helen Whitaker University Professor of Computer Science and Robotics and the director of Quality of Life Technology Engineering Research Center at Carnegie Mellon University. He works in multiple areas of robotics: computer vision, multi-media, manipulators, autonomous mobile robots, medical robotics and sensors. He has written more than 300 technical papers and reports in these areas, and holds more than 20 patents. He has been the principal investigator of more than a dozen major vision and robotics projects at Carnegie Mellon.

We are organizing a workshop of gesture and sign language recognition from video data and still images. Gestures can originate from any body motion or state but commonly originate from the face or hand. Much recent research has focused on emotion recognition from face and hand gestures, with applications in gaming, marketing, and computer interfaces. Many approaches have been made using cameras and computer vision algorithms to interpret sign language for the deaf. However, the identification and recognition of postures and human behaviors is also the subject of gesture recognition techniques and has gained importance in applications such as video surveillance. This workshop aims at gathering researchers from different application domains working on gesture recognition to share algorithms and techniques.

One of the goals of the workshop is to launch a benchmark program on gesture recognition. Part of the workshop will be devoted to discussing the modalities of the benchmark. See our website under construction. Download sample data [or get your sample data DVD at the workshop]. Participate in a data exchange [You will need first to register to our Google group gesturechallenge to get access to the data exchange website].

We invite contributions relevant to gesture recognition, including:
-    Algorithms for gesture and activity recognition, in particular addressing
    o    Learning from unlabeled or partially labeled data
    o    Learning from few examples per class, and transfer learning.
    o    Continuous gesture recognition and segmentation
    o    Deep learning architectures, including convolutional neural networks
    o    Gesture recognition in challenging scenes, including cluttered/moving backgrounds or moving cameras, or scenes where multiple persons are present.
    o    Integrating information from multiple channels (e.g., position/motion of multiple body parts, hand shape, facial expressions).
-    Data representations
-    Applications pertinent to the workshop topic, such as involving:
    o    Video surveillance
    o    Image or video indexing and retrieval
    o    Recognition of sign languages for the deaf
    o    Emotion recognition and affective computing
    o    Computer interfaces
    o    Virtual reality
    o    Robotics
    o    Ambiant intelligence
    o    Games
-    Datasets and benchmarks

The proceedings will be published in the CVPR conference proceedings by IEEE. All papers have been reviewed by the program committee.In addition, the best papers will be invited to a special topic of JMLR and a book will be published by Microtome.

The paper submittion is over but you may still submit an abstract for the Breaking News poster session or for a Demonstration (deadline June 10, 2011) to
gesture@ clopinet . com. All contributions in the scope of the workshop will be accepted.


Monday, June 20, 2011
Colorado Ballroom B

7:30 am-9:00 am Breakfast (included)

[The papers are published in IEEE Explore. For your convenience the workshop participants can access preprints from this page. Login: gesture. Password: cvpr2k11]

Morning: Signs and gestures

9:00 am.
Christian Vogler, Gallaudet University, Washington DC. Advances in Phonetics-based Sub-Unit Modeling for Transcription, Alignment and Sign Language Recognition. Pascal2 best paper award.  Co-authors: Vassilis Pitsikalis, Stavros Theodorakis, and  Petros Maragos [Abstract][Preprint][Slides pptx][Sound] .
9:45 am.
Sudeep Sarkar. University of South Florida. Segmentation-robust Representations, Matching, and Modeling for Sign Language Recognition. Co-authors: Barbara Loeding, Ruiduo Yang, Sunita Nayak, Ayush Parashar. [Abstract][Preprint][Slides].
10:15 am. Break [coffee, snacks provided]
10:30 am.
Dimitri Metaxas, Rutgers University, New Jersey. Sign language and human activity recognition. [Slides].
11:00 am.
Richard Bowden, Univ. Surrey, UK. From activity to language: learning to recognise the meaning of motion. [Abstract][Slides].
11:30 am.
Aleix Martinez, Ohio State University, USA. Deciphering the Face. [Abstract][Preprint][Slides].

12:00-2:00 pm: Poster session and lunch (included) served in the foyer/hallway and Summit Ballroom (4th floor). Finger food will be provided so the participants can talk and view the posters during lunch time.

Afternoon: Gestures and actions

2:00 pm.
David Forsyth, University of Illinois at Urbana-Champaign. Understanding human activity. [Slides].
2:45 pm.
Graham Taylor, NYU, New-York. A tutorial on deep and unsupervised feature learning for activity recognition. [Abstract][Slides].
3:15 pm. Maja Pantic, Imperial College, London. Designing Frameworks for Automatic Affect Prediction and Classification in Dimensional Space. Co-authors: Mihalis A. Nicolaou and Hatice Gunes. [Abstract][Preprint][Slides].
3:45 pm.  Break.
4:00 pm.
Greg Mori, Simon Fraser University, Canada. Learning structured models for recognizing human actions. [Slides]
4:30 pm.
Takeo Kanade, Carnegie Mellon University. Body motion detection and understanding using both 2D and 3D setups.
5:00 pm. Discussion.
5:30 pm. Adjourn.

7:00 pm. Dinner invitation for the invited speakers and the organizers at Fratelli Ristorante. The other participants may join at their own expenses (please contact the organizers).


Each presenter will have one side of a poster board. Poster size = 4'x 8' max  (4' tall x 8' wide or 4' veritcal x 8' horizontal) see picture.

6DMG: A New 6D Motion Gesture Database. Mingyu Chen, Ghassan AlRegib, and Biing-Hwang Juang, Georgia Institute of Technology, Atlanta, Georgia, U.S.A. [Extended summary]

Efficient, Pose Invariant Facial Emotion Classification using 3D Constrained Local Model and 2D Shape Information. Laszlo A. Jeni, University of Tokyo, Japan, Hideki Hashimoto, Chuo University, Japan, András Lörincz, Eotvos Lorand University, Hungary. [Extended summary]

An Optical Flow-based Action Recognition Algorithm. Upal Mahbub, Hafiz Imtiaz, and Md. Atiqur Rahman Ahad. Bangladesh University of Engineering and Technology. [Extended summary]

SURF- and Optical Flow-based Action Recognition with Outlier Management. Atiqur Rahman Ahad, J. Tan, H. Kim, S. Ishikawa. Kyushu Institute of Technology, Japan. [Extended summary]


Gesture Data Collection (GDC) software for the Gesture Recognition Challenge. Isabelle Guyon, Clopinet, California and Vassilis Athitsos, University of Texas at Arlington. [Documentation][Download GDC[

Links to related workshops and benchmarks

SMC'2010 Workshop and Challenge on Robust machine learning techniques for human activity recognition. Just about to start.

Unsupervised and Transfer Learning Challenge. Just ended.

AISTATS 2011 active learning and experimental design workshop. With guest speakers Donald Rubin, Burr Settles, and David Jensen.

WCCI 2011 special session on autonomous and incremental learning and competition on active learning.

NIPS 2009 causality and time series mini-symposium.  Featuring a memorial lecture of Clive Granger by Halbert White.

NIPS 2008 causality workshop: objectives and assessment. The second challenge in causality organized by the causality workbench.

WCCI  2008 causation and prediction challenge. A first activity of the causality workbench.

NIPS 2006 workshop on causality and feature selection. A first workshop on causality..

IJCNN 2007 Agnostic learning vs. Prior knowledge challenge. “When everything fails, ask for additional domain knowledge” is the current motto of machine learning. Therefore, assessing the real added value of prior/domain knowledge is a both deep and practical question.The participants competed in two track: the “prior knowledge track” for which they had access to the raw data and information about the data, and the “agnostic learning track” for which they had access to preprocessed data with no knowledge of the identity of the features.

WCCI 2006 performance prediction challenge. “How good are you at predicting how good you are? 145 participants tried to answer that question. Cross-validation came very strong. Can you do better? Measure yourself against the winners by participating to the model selection game.

NIPS 2003 workshop on feature extraction and feature selection challenge. We organized a competition on five data sets in which hundreds of entries were made. The web site of the challenge is still available for post challenge submissions. Measure yourself against the winners! See the book we published with a CD containing the datasets, tutorials, papers on s.o.a. methods.

Pascal challenges: The Pascal network is sponsoring several challenges in Machine learning.

Data mining competitions:
A list of data mining competitions maintained by KDnuggets, including the well known KDD cup.

List of data sets for machine learning:
A rather comprehensive list maintained by MLnet.

UCI machine learning repository: A great collection of datasets for machine learning research.

DELVE: A platform developed at University of Torontoto benchmark machine learning algorithms.

Critical Assessment of Microarray Data Analysis, an annual conference on gene expression microarray data analysis. This conference includes a context with emphasis on gene selection, a special case of feature selection.

International Conference on Document Analysis and Recognition, a bi-annual conference proposing a contest in printed text recognition. Feature extraction/selection is a key component to win such a contest.

Text Retrieval conference, organized every year by NIST. The conference is organized around the result of a competition. Past winners have had to address feature extraction/selection effectively.

In conjunction with the International Conference on Pattern Recognition, ICPR 2004, a face recognition contest is being organized.

An important competition in protein structure prediction called Critical Assessment of
 Techniques for Protein Structure Prediction.

Program committee
We are very grateful to all the reviewers:

Aleix Martinez, Ohio State University, USA

David W. Aha, Naval Research Laboratory, USA
Abe Schneider, Knexus Research, USA
Jeffrey Cohn, Carnegie Mellon University, USA
Martial Hebert, Carnegie Mellon University, USA
Dimitris Metaxas, Rutgers,  New Jersey, USA
Christian Vogler, ILSP Athens, Greece
Sudeep Sarkar, University of South Florida, USA
Graham Taylor, NYU, New-York, USA
Andrew Ng, Stanford Univ., Palo Alto, CA, USA
Andrew Saxe, Stanford Univ., Palo Alto, CA, USA
Quoc Le, Stanford Univ., Palo Alto, CA, USA
David Forsyth, University of Illinois at Urbana-Champaign, USA
Maja Pantic, Imperial College, London
Philippe Dreuw, RWTH Aachen University, Germany
Richard Bowden, Univ. Surrey, UK
Fernando de la Torre, Carnegie Mellon University, USA
Paulo Gotardo, Ohio State University, Ohio, USA
Carol Neidle, Boston University, MA, USA
Trevor Darrell, UC Berkeley/ICSI, Berkeley, California, USA
Greg Mori, Simon Fraser University, Canada
Matthew Turk, UC Santa Barbara, USA
Atiqur Rahman Ahad, Faculty of Engineering, Kyushu Institute of Technology, Japan
Mingyu Chen,
Georgia Institute of Technology, USA
Wenhui Xu, Georgia Institute of Technology, USA
Jesus-Omar Ocegueda-Gonzalez, University of Houston, USA
Thomas Kinsman, Rochester Institute of Technology, USA
Lőrincz,  Eötvös Loránd University, Budapest, Hungary
Upal Mahbub, Bangladesh University of Engineering, Bangladesh
Subhransu Maj, UC Berkeley, CA, USA
Lubomir Bourdev,  UC Berkeley, CA, USA
Vassilis Pitsikalis, NTUA, Greece

Contact information

Organizing committee:
Isabelle Guyon, Clopinet, Berkeley, California
Vassilis Athitsos
, University of Texas at Arlington
Jitendra Malik
, UC Berkeley, California
Ivan Laptev
, INRIA, France


gesture@ clopinet . com.


We are grateful to the DARPA Deep Learning program, the Naval Research Laboratory, and the Pascal2 European network for their support.