Abstract
Recognition and Tracking of Human Motion. Talk given at the Math & Computer Science Department, University of Athens, Greece, May 27, 2003.
Full-color handouts (512K) - Black and white handouts (suitable for printing) (610K)
Recognition of human motion, such as gesturing or facial expressions, faces many interesting challenges. In particular, the computational complexity of the task is a big problem and raises two fundamental questions: 1. How do we model and process the potentially infinite number of movements that a human can perform? 2. How do we account for the many movements that humans can perform simultaneously, such as gesturing with both hands at once, without getting bogged down in a combinatorial explosion of simultaneous events?
In this talk I present American Sign Language recognition as a case study to explore possible answers to these two questions. By breaking down the signs into their constituent part - phonemes -, it is possible to build up a virtually infinite number of signs from a small number of basic building blocks. By breaking up simultaneous events into channels that are stochastically independent from one another, it is possible to decouple them from one another, and to avoid having to model each possible combination a priori. I introduce parallel hidden Markov models (HMMs) as an extension to classical HMMs to perform the actual recognition of simultaneous events.
Many motion recognition frameworks assume that they have 3D data from the human body available. Yet, tracking the motion over a sequence of 2D images and extracting 3D information from them is a very difficult problem in its own right. To this end, I briefly discuss recent advances in tracking the human head and face with 3D deformable models, based on a novel statistical integration technique.
This talk draws on joint work with Siome Goldenstein and Dimitris Metaxas.
