Dissertation Abstract

Christian Vogler. American Sign Language Recognition: Reducing the Complexity of the Task with Phoneme-Based Modeling and Parallel Hidden Markov Models. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania, December 2002.

PDF file (8.7 MB)

In this thesis I present a framework for recognizing American Sign Language (ASL) from 3D data. The goal is to develop approaches that will scale well with increasing vocabulary sizes.

Scalability is a major concern, because the computational treatment of ASL is a very complex undertaking. There are two sources of this complexity that are particularly relevant to this thesis: First, ASL is a highly inflected language, where signs take on many different appearances, depending on the subject, object and number of the phrases. There are too many of these appearances to model them all separately.

Second, in ASL events occur both sequentially and simultaneously. In speech recognition simultaneous events exist, too, but on an abstract level these all can be represented sequentially. In contrast, in ASL recognition this abstraction is not possible, because of the sheer number of combinations of simultaneous events, which makes it infeasible to model them all explicitly. As a result, the computational treatment of ASL is much more complex than the computational treatment of spoken languages. Yet, a complete, scalable recognition system must somehow be able to handle these simultaneous events in a systematic manner.

The framework uses a two-pronged approach to reduce the complexity of the recognition task, which encompasses work on both the modeling and the computational sides. On the modeling side it tackles the many different appearances by breaking the signs down into their constituent phonemes, which are limited in number. It uses the Movement-Hold phonological model for ASL as a guideline, and extends the parts of it that are not directly applicable to recognition systems. Furthermore, it recasts the model to describe simultaneous events in independent channels, so that it is no longer necessary to consider all their possible combinations, thus greatly reducing the modeling complexity.

On the recognition side, it uses parallel hidden Markov models (PaHMMs) as an extension to conventional hidden Markov models. I develop a PaHMM recognition algorithm specifically geared toward the properties of sign languages. PaHMMs are the computational counterpart to modeling simultaneous events in independent channels, which allow putting these events together on the fly at recognition time, instead of having to consider them a-priori.

I validate the modeling approach and the PaHMM recognition algorithm in a pilot study with with experiments on 53-sign and 22-sign data sets. In the PaHMM experiments, the independent channels consist of the hand movements of both hands, and the handshape of the strong hand. The results demonstrate the viability of both the phoneme modeling and the modeling of simultaneous events in independent channels.