Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced       Help   

 The Delphion Integrated View

  Buy Now:   Buy PDF- 17pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
 Email this to a friend  Email this to a friend 
Title: US6424946: Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering
[ Derwent Title ]

Country: US United States of America

View Images High


17 pages

Inventor: Tritschler, Alain Charles Louis; New York, NY
Viswanathan, Mahesh; Yorktown Heights, NY

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2002-07-23 / 1999-11-05

Application Number: US1999000434604

IPC Code: Advanced: G06F 3/16; G06F 17/30; G10L 15/00; G10L 15/08; G10L 15/10; G10L 15/26; G10L 15/28; G10L 17/00;
IPC-7: G10L 15/22;

ECLA Code: G10L15/26A; G06F17/30U; G06K9/62B1P3; G10L17/00U;

U.S. Class: 704/272; 704/500; 704/275; 704/251;

Field of Search: 704/231,245,256,255,500,240,241,239,270,257,251,235,250,253,272,275,236,238,260,200

Priority Number:
1999-11-05  US1999000434604
1999-04-09  US1999000288724
1999-06-30  US1999000345237

Abstract:     A method and apparatus are disclosed for identifying speakers participating in an audio-video source, whether or not such speakers have been previously registered or enrolled. The speaker identification system uses an enrolled speaker database that includes background models for unenrolled speakers, such as "unenrolled male" or "unenrolled female," to assign a speaker label to each identified segment. Speaker labels are identified for each speech segment by comparing the segment utterances to the enrolled speaker database and finding the "closest" speaker, if any. A speech segment having an unknown speaker is initially assigned a general speaker label from the set of background models. The "unenrolled" segment is assigned a segment number and receives a cluster identifier assigned by the clustering system. If a given segment is assigned a temporary speaker label associated with an unenrolled speaker, the user can be prompted by the present invention to identify the speaker. Once the user assigns a speaker label to an audio segment having an unknown speaker, the same speaker name can be automatically assigned to any segments that are assigned to the same cluster and the enrolled speaker database can be automatically updated to enroll the previously unknown speaker.

Attorney, Agent or Firm: Ryan, Mason & Lewis, LLP ; Otterstedt, Paul J. ;

Primary / Asst. Examiners: Dorvil, Richemond;

INPADOC Legal Status: Show legal status actions          Buy Now: Family Legal Status Report

Related Applications:
Application Number Filed Patent Pub. Date  Title
US1999000288724 1999-04-09    2002-02-05  Methods and apparatus for retrieving audio information using content and speaker information
US1999000345237 1999-06-30       

Parent Case:     This application is a continuation-in-part of U.S. patent application Ser. No. 09/345,237, filed Jun. 30, 1999, which is a continuation-in-part of U.S. patent application Ser. No. 09/288,724, filed Apr. 9, 1999, issued as U.S. Pat. No. 6,345,252 each assigned to the assignee of the present invention and incorporated by reference herein.


Family: Show 15 known family members

First Claim:
Show all 25 claims
What is claimed is:     1. A method for identifying a speaker in an audio source, said method comprising the steps of:
  • transcribing said audio source to create a textual version of said audio information;
  • identifying potential segment boundaries in said audio source; and
  • assigning a speaker label to each identified segment, said speaker label being selected from a speaker database that includes at least one model for an unenrolled speaker.

Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 28 U.S. patent(s) that reference this one

U.S. References: Go to Result Set: All U.S. references   |  Forward references (28)   |   Backward references (2)   |   Citation Link

Patent  Pub.Date  Inventor Assignee   Title
Get PDF - 16pp US5659662  1997-08 Wilcox et al.  Xerox Corporation Unsupervised speaker clustering for automatic speaker indexing of recorded audio data
Get PDF - 21pp US6185527  2001-02 Petkovic et al.  International Business Machines Corporation System and method for automatic audio content analysis for word spotting, indexing, classification and retrieval
Foreign References: None

Other Abstract Info: DERABS G2001-149202 DERABS G2001-285964

Other References:
  • ICASSP-97. Roy et al., "Speaker Identification based text to audio alignment for audio retrieval system". pp. 1099-1102, vol. 2. Apr. 1997.*
  • S. Dharanipragada et al., "Experimental Results in Audio Indexing," Proc. ARPA SLT Workshop, (Feb. 1996).
  • L. Polymenakos et al., "Transcription of Broadcast News--Some Recent Improvements to IBM's LVCSR System," Proc. APRA SLT Workshop, (Feb. 1996).
  • R. Bakis, "Transcription of Broadcast News Shows with the IBM Large Vocabulary Speech Recognition System," Proc. ICASSP98, Seattle, WA (1998).
  • H. Beigi et al., "A Distance Measure Between Collections of Distributions and its Application to Speaker Recognition," Proc. ICASSP98, Seattle, WA (1998).
  • S. Chen, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion," Proceedings of the Speech Recognition Workshop (1998).
  • S. Chen et al., "Clustering via the Bayesian Information Criterion with Applications in Speech Recognition," Proc. ICASSP98, Seattle, WA (1998).
  • S. Chen et al., "IBM's LVCSR System for Transcription of Broadcast News Used in the 1997 Hub4 English Evalution," Proceedings of the Speech Recognition Workshop (1998).
  • S. Dharanipragada et al., "A Fast Vocabulary Independent Algorithm for Spotting Words in Speech," Proc. ICASSP98, Seattle, WA (1998).
  • J. Navratil et al., "An Efficient Phonotactic-Acoustic system for Language Identification," Proc. ICASSP98, Seattle, WA (1998).
  • G. N. Ramaswamy et al., "Compression of Acoustic Features for Speech Recognition in Network Environments," Proc. ICASSP98, Seattle, WA (1998).
  • S. Chen et al., "Recent Improvements to IBM's Speech Recognition System for Automatic Transcription of Broadcast News," Proceedings of the Speech Recognition Workshop (1999).
  • S. Dharanipragada et al., "Story Segmentation and Topic Detection in the Broadcast News Domain," Proceedings of the Speech Recognition Workshop (1999).
  • C. Neti et al., "Audio-Visual Speaker Recognition for Video Broadcast News," Proceedings of the Speech Recognition Workshop (1999).

  • Inquire Regarding Licensing

    Powered by Verity

    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2014 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help