Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced       Help   

 The Delphion Integrated View

  Buy Now:   Buy PDF- 14pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
 Email this to a friend  Email this to a friend 
Title: US6421645: Methods and apparatus for concurrent speech recognition, speaker segmentation and speaker classification
[ Derwent Title ]

Country: US United States of America

View Images High


14 pages

Inventor: Beigi, Homayoon Sadr Mohammad; Yorktown Heights, NY
Tritschler, Alain Charles Louis; New York, NY
Viswanathan, Mahesh; Yorktown Heights, NY

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2002-07-16 / 1999-06-30

Application Number: US1999000345237

IPC Code: Advanced: G06F 3/16; G06F 17/30; G10L 15/00; G10L 15/08; G10L 15/10; G10L 15/26; G10L 15/28; G10L 17/00;
IPC-7: G10L 15/00;

ECLA Code: G06K9/62B1P3; G06F17/30U1T; G10L15/26A; G10L17/00U;

U.S. Class: 704/272; 704/500; 704/275; 704/251;

Field of Search: 704/231,500,245,239,241,240,256,255,251,235,253,270,257,272,275,260,236,238

Priority Number:
1999-06-30  US1999000345237
1999-04-09  US1999000288724

Abstract:     A method and apparatus are disclosed for automatically transcribing audio information from an audio-video source and concurrently identifying the speakers. The disclosed audio transcription and speaker classification system includes a speech recognition system, a speaker segmentation system and a speaker identification system. A common front-end processor computes feature vectors that are processed along parallel branches in a multi-threaded environment by the speech recognition system, speaker segmentation system and speaker identification system, for example, using a shared memory architecture that acts in a server-like manner to distribute the computed feature vectors to a channel associated with each parallel branch. The speech recognition system produces transcripts with time-alignments for each word in the transcript. The speaker segmentation system separates the speakers and identifies all possible frames where there is a segment boundary between non-homogeneous speech portions. The speaker identification system thereafter uses an enrolled speaker database to assign a speaker to each identified segment. The audio information from the audio-video source is concurrently transcribed and segmented to identify segment boundaries. Thereafter, the speaker identification system assigns a speaker label to each portion of the transcribed text.

Attorney, Agent or Firm: Ryan, Mason & Lewis, LLP ; Otterstedt, Paul J. ;

Primary / Asst. Examiners: Dorvil, Richemond;

INPADOC Legal Status: Show legal status actions          Buy Now: Family Legal Status Report

Related Applications:
Application Number Filed Patent Pub. Date  Title
US1999000288724 1999-04-09    2002-02-05  Methods and apparatus for retrieving audio information using content and speaker information

Parent Case:     This application is a continuation-in-part of U.S. patent application Ser. No. 09/288,724, filed Apr. 9, 1999, which is assigned to the assignee of the present invention and incorporated by reference herein issued as U.S. Pat. No. 6,345,252.


Family: Show 15 known family members

First Claim:
Show all 23 claims
What is claimed is:     1. A method for transcribing audio information from one or more audio sources, said method comprising the steps of:
  • transcribing said audio source to create a textual version of said audio information;
  • identifying potential segment boundaries in said audio source substantially concurrently with said transcribing step; and
  • assigning a speaker label to each identified segment.

Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 33 U.S. patent(s) that reference this one

U.S. References: Go to Result Set: All U.S. references   |  Forward references (33)   |   Backward references (2)   |   Citation Link

Patent  Pub.Date  Inventor Assignee   Title
Get PDF - 16pp US5659662  1997-08 Wilcox et al.  Xerox Corporation Unsupervised speaker clustering for automatic speaker indexing of recorded audio data
Get PDF - 21pp US6185527  2001-02 Petkovic et al.  International Business Machines Corporation System and method for automatic audio content analysis for word spotting, indexing, classification and retrieval
Foreign References: None

Other Abstract Info: DERABS G2001-149202 DERABS G2001-285964

Other References:
  • ICASSP-97. Roy et al., "Speaker Identification based text to audio alignment for audio retrieval system". pp. 1099-1102, vol. 2. Apr. 1997.*
  • S. Dharanipragada et al., "Experimental Results in Audio Indexing," Proc. ARPA SLT Workshop, (Feb. 1996).
  • L. Polymenakos et al., "Transcription of Broadcast News--Some Recent Improvements to IBM's LVCSR System," Proc. ARPA SLT Workshop, (Feb. 1996).
  • R. Bakis, "Transcription of Broadcast News Shows with the IBM Large Vocabulary Speech Recognition System," Proc. ICASSP98, Seattle, WA (1998).
  • H. Beigi et al., "A Distance Measure Between Collections of Distributions and its Application to Speaker Recognition," Proc. ICASSP98, Seattle, WA (1998).
  • S. Chen, "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion," Proceedings of the Speech Recognition Workshop (1998).
  • S. Chen et al., "Clustering via the Bayesian Information Criterion with Applications in Speech Recognition," Proc. ICASSP98, Seattle, WA (1998).
  • S. Chen et al., "IBM's LVCSR System for Transcription of Broadcast News Used in the 1997 Hub4 English Evaluation," Proceedings of the Speech Recognition Workshop (1998).
  • S. Dharanipragada et al., "A Fast Vocabulary Independent Algorithm for Spotting Words in Speech," Proc. ICASSP98, Seattle, WA (1998).
  • J. Navratil et al., "An Efficient Phonotactic-Acoustic system for Language Identification," Proc. ICASSP98, Seattle, WA (1998).
  • G. N. Ramaswamy et al., "Compression of Acoustic Features for Speech Recognition in Network Environments," Proc. ICASSP98, Seattle, WA (1998).
  • S. Chen et al., "Recent Improvements to IBM's Speech Recognition System for Automatic Transcription of Broadcast News," Proceedings of the Speech Recognition Workshop (1999).
  • S. Dharanipragada et al., "Story Segmentation and Topic Detection in the Broadcast News Domain," Proceedings of the Speech Recognition Workshop (1999).
  • C. Neti et al., "Audio-Visual Speaker Recognition for Video Broadcast News," Proceedings of the Speech Recognition Workshop (1999).

  • Inquire Regarding Licensing

    Powered by Verity

    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2014 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help