Title: US6185529: Speech recognition aided by lateral profile image
Country: US United States of America

Inventor: Chen, Chengjun Julian; White Plains, NY
Wu, Frederick Yung-Fung; Cos Cob, CT
Yeh, James T.; Katonah, NY

Assignee: International Business Machines Corporation, Armonk, NY
Published / Filed: 2001-02-06 / 1998-09-14

Application Number: US1998000153219

IPC Code: Advanced: G06T 7/00; G09F 27/00; G10L 15/02; G10L 15/06; G10L 15/14; G10L 15/24;
IPC-7: G10L 15/24;

ECLA Code: G10L15/25;

U.S. Class: Current: 704/251; 704/231; 704/270; 704/E15.042;
Original: 704/251; 704/231; 704/270;

Field of Search: 704/24.8,215,252,251,256 382/118,190,159,202,100

Priority Number:
1998-09-14  US1998000153219

Abstract:     An apparatus and a method for imaging the mouth area laterally to produce reliable measurements of mouth and lip shapes for use in assisting the speech recognition task. A video camera is arranged with a headset and a microphone to capture a lateral profile image of a speaker. The lateral profile image is then used to compute features such as lip separation, lip shape and intrusion depth parameters. The parameters are used in real time, during speech recognition process to characterize and discriminate spoken phonemes to produce a high degree of accuracy in automatic speech recognition processing, especially in a noisy environment.

Attorney, Agent or Firm: Scully, Scott, Murphy & Presser ; Kaufman, Esq., Stephen C. ;

Primary / Asst. Examiners: Smits, Talivaldis I.; Nolan, Daniel A.

What is claimed is:     1. An apparatus for acquiring video images and video data for use with automated acoustic speech recognition systems, comprising:
  • a video camera positioned to capture lateral views of a speaker's mouth area while uttering speech, the video camera capturing lateral images; and
  • first means for computing image features from a set of only captured lateral images,
  • wherein the computed image features are used with acoustics signals of the uttered speech for automated speech recognition.

Patent  Pub.Date  Inventor Assignee   Title
Get PDF - 5pp US4757541  1988-07 Beadles  Research Triangle Institute Audio visual speech recognition
Get PDF - 11pp US4769845  1988-09 Nakamura  Kabushiki Kaisha Carrylab Method of recognizing speech using a lip image
Get PDF - 22pp US4975960  1990-12 Petajan   Electronic facial tracking and detection system and method and apparatus for automated speech recognition
Get PDF - 8pp US5286205  1994-02 Inouye et al.   Method for teaching spoken English using mouth position characters
Get PDF - 24pp US5586215  1996-12 Stork et al.  Ricoh Corporation Neural network acoustic and visual speech recognition system
Get PDF - 7pp US5806036  1998-09 Stork  Ricoh Company, Ltd. Speechreading using facial feature parameters from a non-direct frontal view of the speaker
Foreign References:
Publication Date IPC Code Assignee   Title
  GB0254409 1988-04  G10L 5/06    

Other Abstract Info: DERABS G2000-354532

Other References:
  • Benoit, "Synthesis and Automatic Recognition of Audio-Visual Speech", Integrated Audio-Visual Processing for Recognition, Synthesis & Communication colloquium, IEEE, Nov. 28, 1996.
  • Wu, et al., "Speaker-Idependent Vowel Recognition Combining Voice Features and Mouth Shape Image With Neural Network"; Systems and Computers in Japan, vol. 22, No. 4, pp. 100-107(1991).
  • Silsbee, et al., "Automatic Lipreading"; Biomedical Sciences Instrumentation, v 29, pp. 415-422 (1993).
  • Silsbee, et al., "Audio Visual Speech Recognition For A Vowel Discrimination Task"; Proc. SPIE-Int. Soc. Opt. Eng. (USA) v 2094, pp. 84-95(1993).
  • Kenji Mase, et al., "Automatic Lipreading by Optical-Flow Analysis"; Systems and Computers in Japan, vol. 22, No. 6 (1991).
  • Lalit R. Bahl, et al; "Performance of the IBM Large Vocabulary Continuous Speech Recognition System on the ARPA Wall Street Journal Task"; Computer Science RC 19635 (87076) (1994).

