Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced       Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 17pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
 
 Email this to a friend  Email this to a friend 
       
Title: US5640487: Building scalable n-gram language models using maximum likelihood maximum entropy n-gram models
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
17 pages

 
Inventor: Lau, Raymond; Cambridge, MA
Rosenfeld, Ronald; Pittsburgh, PA
Roukos, Salim; Scarsdale, NY

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 1997-06-17 / 1995-06-07

Application Number: US1995000487299

IPC Code: Advanced: G06F 17/28; G10L 15/06; G10L 15/10; G10L 15/18; G10L 15/28;
IPC-7: G10L 5/06; G10L 9/00;

ECLA Code: G10L15/197; S10L15/183; S10L15/197;

U.S. Class: Current: 704/243; 704/240; 704/255; 704/E15.023;
Original: 395/002.52; 395/002.49; 395/002.64;

Field of Search: 381/041-46 395/2.4,2.49,2.45,2.52-2.54,2.59,2.64-2.66

Priority Number:
1995-06-07  US1995000487299
1993-02-26  US1993000023543

Abstract:     The present invention is an n-gram language modeler which significantly reduces the memory storage requirement and convergence time for language modelling systems and methods. The present invention aligns each n-gram with one of "n" number of non-intersecting classes. A count is determined for each n-gram representing the number of times each n-gram occurred in the training data. The n-grams are separated into classes and complement counts are determined. Using these counts and complement counts factors are determined, one factor for each class, using an iterative scaling algorithm. The language model probability, i.e., the probability that a word occurs given the occurrence of the previous two words, is determined using these factors.

Attorney, Agent or Firm: Sterne, Kessler, Goldstein & Fox P.L.L.C. ; Tasinari, Robert ;

Primary / Asst. Examiners: Sheikh, Ayaz R.; Edouard, Patrick N.

INPADOC Legal Status: Show legal status actions          Buy Now: Family Legal Status Report

       
Related Applications:
Application Number Filed Patent Pub. Date  Title
US1993000023543 1993-02-26    1995-11-14  Building scalable N-gram language models using maximum likelihood maximum entropy N-gram models


       
Parent Case:     This application is a division of Ser. No. 08/023,543, filed Feb. 26, 1993 now, U.S. Pat. No. 5,467,425, issued Nov. 14, 1995.

Family: Show 4 known family members

First Claim:
Show all 10 claims
What is claimed is:     1. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform, in a computer based language modelling system receiving data in the form of a series of n-grams, each n-gram comprising a series of "n" words (w1, w2, . . . , wn), each n-gram having an associated count, method steps for classifying the n-grams into non-redundant classes, said method steps comprising:
  • (a) comparing the count of each n-gram to a first threshold value and classifying each n-gram with a count greater than said first threshold in a first class;
  • (b) associating all n-grams not classified in step (a) with a putative (n-1)-gram class, each said putative (n-1)-gram class having the same last "n-1" words (w2, w3, . . . ,wn);
  • (c) establishing a complement count for each said putative (n-1)-gram class by summing the counts of each n-gram in said putative (n-1)-gram class; and
  • (d) comparing said complement count of each said putative (n-1)-gram class to a second threshold value and classifying each said putative (n-1)-gram class with a count greater than said second threshold in a second class.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 29 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (29)   |   Backward references (4)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Get PDF - 30pp US4817156  1989-03 Bahl et al.  International Business Machines Corporation Rapidly training a speech recognizer to a subsequent speaker given training data of a reference speaker
Get PDF - 18pp US4831550  1989-05 Katz  International Business Machines Corporation Apparatus and method for estimating, from sparse data, the probability that a particular one of a set of events is the next event in a string of events
Get PDF - 15pp US5293584  1994-03 Brown et al.  International Business Machines Corporation Speech recognition system for natural language translation
Get PDF - 21pp US5467425  1995-11 Lau et al.  International Business Machines Corporation Building scalable N-gram language models using maximum likelihood maximum entropy N-gram models
       
Foreign References: None

Other Abstract Info: DERABS G1995-403732 DERABS G1997-414037 DERABS G1997-414037

Other References:
  • Bahl, Lalit R., Frederick Jelinek and Robert L. Mercer, "A Maximum Likelihood Approach to Continuous Speech Recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-5, No. 2, Mar. 1983, pp. 179-190. (12 pages) Cited by 42 patents
  • Ney et al., "On Smoothing Techniques for Bigram-Based Nayural Language Modelling", ICASSP '91, 1991, pp. 825-828.
  • Passeler et al., "Continuous-Speech Recognition Using a Stochastic Language Model", ICASSP '89, 1989, pp. 719-722.
  • Jelinek et al., "Classifying Words for Improved Statistical Language Models", ICASSP '90. 1990, pp. 621-624.


  • Inquire Regarding Licensing

    Powered by Verity


    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2014 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help