Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced       Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 20pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
 
 Email this to a friend  Email this to a friend 
       
Title: US6009392: Training speech recognition by matching audio segment frequency of occurrence with frequency of words and letter combinations in a corpus
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
20 pages

 
Inventor: Kanevsky, Dimitri; Ossining, NY
Zadrozny, Wlodek Wlodzimierz; Tarrytown, NY

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 1999-12-28 / 1998-01-15

Application Number: US1998000007478

IPC Code: Advanced: G01L 9/00;
IPC-7: G01L 9/00;

ECLA Code: G10L15/063; S10L15/063C;

U.S. Class: Current: 704/245; 704/240; 704/255;
Original: 704/245; 704/240; 704/255;

Field of Search: 704/243,255,245,240,236,257,270 382/228

Priority Number:
1998-01-15  US1998000007478

Abstract:     A method is provided which trains acoustic models in an automatic speech recognizer ("ASR") without explicitly matching decoded scripts with correct scripts from which acoustic training data is generated. In the method, audio data is input and segmented to produce audio segments. The audio segments are clustered into groups of clustered audio segments such that the clustered audio segments in each of the groups have similar characteristics. Also, the groups respectively form audio similarity classes. Then, audio segment probability distributions for the clustered audio segments in the audio similarity classes are calculated, and audio segment frequencies for the clustered audio segments are determined based on the audio segment probability distributions. The audio segment frequencies are matched to known audio segment frequencies for at least one of letters, combination of letters, and words to determine frequency matches, and a textual corpus of words is formed based on the frequency matches. Then, acoustic models of the automatic speech recognizer are trained based on the textual corpus. In addition, the method may receive and cluster video or biometric data, and match such data to the audio data to more accurately cluster the audio segments into the groups of audio segments. Also, an apparatus for performing the method is provided.

Attorney, Agent or Firm: Sughrue, Mion, Zinn, Macpeak & Seas, PLLC ;

Primary / Asst. Examiners: Hudspeth, David R.; Storm, Donald L.

Maintenance Status: E2 Expired  Check current status

INPADOC Legal Status: Show legal status actions

Family: None

First Claim:
Show all 47 claims
What is claimed:     1. A method for training an automatic speech recognizer, comprising the steps of:
  • (a) inputting audio data;
  • (b) segmenting said audio data to produce audio segments of said audio data;
  • (c) clustering said audio segments into groups of clustered audio segments, wherein said clustered audio segments in each of said groups have similar characteristics and wherein said groups respectively form audio similarity classes;
  • (d) calculating audio segment probability distributions for said clustered audio segments in said audio similarity classes;
  • (e) determining audio segment frequencies for said clustered audio segments in said audio similarity classes based on said audio segment probability distributions;
  • (f) matching said audio segment frequencies to known audio segment frequencies for at least one of letters, combination of letters, and words to determine frequency matches;
  • (g) forming a textual corpus of words based on said frequency matches; and
  • (h) training acoustic models of said automatic speech recognizer based on said textual corpus.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 30 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (30)   |   Backward references (4)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Get PDF - 14pp US5122951  1992-06 Kayima  Sharp Kabushiki Kaisha Subject and word associating devices
Get PDF - 20pp US5625748  1997-04 McDonough et al.  BBN Corporation Topic discriminator using posterior probability or confidence scores
Get PDF - 16pp US5649060  1997-07 Ellozy et al.  International Business Machines Corporation Automatic indexing and aligning of audio and text using speech recognition
Get PDF - 16pp US5659662  1997-08 Wilcox et al.  Xerox Corporation Unsupervised speaker clustering for automatic speaker indexing of recorded audio data
       
Foreign References: None

Inquire Regarding Licensing

Powered by Verity


Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

Thomson Reuters Copyright © 1997-2014 Thomson Reuters 
Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help