Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced       Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 17pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
 
 Email this to a friend  Email this to a friend 
       
Title: US6219640: Methods and apparatus for audio-visual speaker recognition and utterance verification
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
17 pages

 
Inventor: Basu, Sankar; Tenafly, NJ
Beigi, Homayoon S. M.; Yorktown Heights, NY
Maes, Stephane Herman; Danbury, CT
Ghislain Maison, Benoit Emmanuel; White Plains, NY
Neti, Chalapathy Venkata; Yorktown Heights, NY
Senior, Andrew William; New York, NY

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2001-04-17 / 1999-08-06

Application Number: US1999000369706

IPC Code: Advanced: G06K 9/00; G06K 9/62; G06K 9/68; G06T 7/00; G07C 9/00; G10L 17/00; G10L 21/06; G10L 15/22;
IPC-7: G10L 15/00;

ECLA Code: G06K9/62F3M; G06K9/00F; G06K9/00X; G07C9/00C2D;

U.S. Class: Current: 704/246; 704/231; 704/273;
Original: 704/246; 704/231; 704/273;

Field of Search: 382/115,118 379/88.02 704/273,246,231,251,275

Priority Number:
1999-08-06  US1999000369706

Abstract:     Methods and apparatus for performing speaker recognition comprise processing a video signal associated with an arbitrary content video source and processing an audio signal associated with the video signal. Then, an identification and/or verification decision is made based on the processed audio signal and the processed video signal. Various decision making embodiments may be employed including, but not limited to, a score combination approach, a feature combination approach, and a re-scoring approach. In another aspect of the invention, a method of verifying a speech utterance comprises processing a video signal associated with a video source and processing an audio signal associated with the video signal. Then, the processed audio signal is compared with the processed video signal to determine a level of correlation between the signals. This is referred to as unsupervised utterance verification. In a supervised utterance verification embodiment, the processed video signal is compared with a script representing an audio signal associated with the video signal to determine a level of correlation between the signals.

Attorney, Agent or Firm: Otterstedt, Paul J.Ryan, Mason & Lewis, LLP ;

Primary / Asst. Examiners: Hudspeth, David; Abebe, Daniel

INPADOC Legal Status: Show legal status actions          Buy Now: Family Legal Status Report

Parent Case:

CROSS REFERENCE TO RELATED APPLICATIONS
    The present application is related to the U.S. patent application entitled: "Methods And Apparatus for Audio-Visual Speech Detection and Recognition," filed concurrently herewith and incorporated by reference herein.

Family: Show 3 known family members

First Claim:
Show all 61 claims
What is claimed is:     1. A method of performing speaker recognition, the method comprising the steps of:
  • processing a video signal associated with an arbitrary content video source;
  • processing an audio signal associated with the video signal; and
  • making at least one of an identification and verification decision based on the processed audio signal and the processed video signal.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 32 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (32)   |   Backward references (7)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Get PDF - 8pp US4449189  1984-05 Feix et al.  Siemens Corporation Personal access control system using speech and face recognition
Get PDF - 30pp US4757451  1988-07 Beadles  Teraoka Seiko Co., Limited Film folding control apparatus of wrapping system
Get PDF - 11pp US4845636  1989-07 Walker   Remote transaction system
Get PDF - 12pp US5412738  1995-05 Brunelli et al.  Istituto Trentino Di Cultura Recognition system, particularly for recognising people
Get PDF - 22pp US5602933  1997-02 Blackwell et al.  Scientific-Atlanta, Inc. Method and apparatus for verification of remotely accessed data
Get PDF - 26pp US5625704  1997-04 Prasad  Ricoh Corporation Speaker recognition using spatiotemporal cues
Get PDF - 15pp US5897616  1999-04 Kanevsky et al.  International Business Machines Corporation Apparatus and methods for speaker verification/identification/classification employing non-acoustic and/or acoustic models and databases
       
Foreign References: None

Other References:
  • C. Neti et al., "Audio-Visual Speaker Recognition For Video Broadcast News", Proceedings of the ARPA HUB4 Workshop, Washington, D.C., pp. 1-3, Mar. 1999.
  • A.W. Senior, "Face and Feature Finding For a Face Recognition System," Second International Conference on Audio-and Video-based Biometric Person Authentication, Washington, D.C., pp. 1-6, Mar. 1999.
  • P. De Cuetos et al., "Frontal Pose Detection for Human-Computer Interaction," pp. 1-12, Jun. 23, 1999.
  • R. Stiefelhagen et al., "Real-Time Lip-Tracking for Lipreading," Interactive Systems Labortories, University of Karlsruhe, Germany and Carnegie Mellon University, U.S.A., pp. 1-4, Apr. 27, 1998.
  • P.N. Belhumeur et al., "Eigenfaces vs. Fisherfaces: Recognition Using Class Specfic Linear Projection," IEEE Trans. on PAMI, pp. 1-34, Jul. 1997.
  • N.R. Garner et al., "Robust Noise Detection for Speech Detection and Enhancement," IEE, pp. 1-2, Nov. 5, 1996.
  • H. Ney, "On the Probabilistic Interpretation of Neural Network Classifiers and Discriminative Training Criteria," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17,No. 2,pp. 107-112, Feb. 1995. (13 pages) Cited by 5 patents [ISI abstract]
  • L. Wiskott et al., "Recognizing Faces by Dynamic Link Matching," ICANN '95, Paris, Francis, pp. 347-342, 1995.
  • A.H. Gee et al., "Determining the Gaze of Faces in Images," Univeristy of Cambridge, Cambridge, England, pp. 1-20, Mar. 1994.
  • C. Bregler et al., "Eigenlips For Robust Speech Recognition," IEEE, pp. II-669-II-672.
  • C. Benoil et al., "Which Components of the Face Do Humans and Machines Best Speechread?, " Institut de la Communication Parlee, Grenoble, France, pp. 315-328.
  • Q. Summerfield, "Use of Visual Information for Phonetic Perception," Visual Information for Phonetic Perception, MRC Institute of Hearing Research, Univeristy Medical School, Nottingham, pp. 314-330.
  • N. Kruger et al., "Determination of Face Position and Pose With a Learned Representation Based on Label Graphs," Ruhr-Universitat Bochum, Bochum, Germany and Univesity of Southern California, Los Angeles, CA, pp. 1-19.
  • G. Potamianos et al., "Discriminative Training ofHMM Stream Exponents for Audio Visual Speech Recognition," AT&T Labs Research, Florham and Red Bank, NJ, pp. 1-4.


  • Inquire Regarding Licensing

    Powered by Verity


    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2014 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help