Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced       Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 18pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
 
 Email this to a friend  Email this to a friend 
       
Title: US4831550: Apparatus and method for estimating, from sparse data, the probability that a particular one of a set of events is the next event in a string of events
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
18 pages

 
Inventor: Katz, Slava M.; Westport, CT

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 1989-05-16 / 1986-03-27

Application Number: US1986000844904

IPC Code: Advanced: G06K 9/72; G10L 11/00; G10L 15/00; G10L 15/10; G10L 15/14; G10L 15/18;
IPC-7: G10L 1/00;

ECLA Code: G10L15/00; G06K9/72L; G10L15/14; T05K999/99;

U.S. Class: Current: 704/240;
Original: 364/513.5; 381/043;

Field of Search: 381/043 364/513.5

Priority Number:
1986-03-27  US1986000844904

Abstract:     Apparatus and method for evaluating the likelihood of an event (such as a word) following a string of known events, based on event sequence counts derived from sparse sample data. Event sequences -- or m-grams -- include a key and a subsequent event. For each m-gram which was counted in the sample data, there is stored a discounted probability @ generated by applying a modified Turing's estimate, for example, to a count-based probability. For a key occurring in the sample data there is stored a normalization constant alpha which (a) adjusts the discounted probabilities for multiple counting, if any, and (b) includes a freed probability mass allocated to m-grams which do not occur in the sample data. To determine the likelihood of a selected event following a string of known events, a "backing off" scheme is employed in which successively shorter included keys (of known events) followed by the selected event (representing m-grams) are searched (302, 308) until an m-gram is found having a discounted probability stored therefor. The normalization constants (306, 312) of the longer searched keys -- for which the corresponding m-grams have no stored discounted probability -- are combined together with the found discounted probability to produce (304, 310, 314) the likelihood of the selected event being next.

Attorney, Agent or Firm: Schechter, Marc D. ;

Primary / Asst. Examiners: Harkcom, Gary V.; Lynt, Christopher H.

Maintenance Status: E3 Expired  Check current status
CC Certificate of Correction issued

INPADOC Legal Status: Show legal status actions          Buy Now: Family Legal Status Report

Designated Country: DE FR GB IT 

Family: Show 8 known family members

First Claim:
Show all 14 claims
I claim:     1. In a speech recognition system, a computer-implemented method of evaluating the likelihood of a word from a vocabulary of words occurring next after a string of known words, based on counts of word sequences occurring in a sample text which is sparse relative to possible word sequences, the method comprising the steps of:
  • (a) characterizing word sequences as m-grams, each m-gram occurring in the sample text representing a key of words followed by a word;
  • (b) storring a discounted probability P for each of at least some m-grams occurring in the sample text;
  • (c) generating a freed probability mass value βL for each key occurring in the sample text, the βL for a key of length L being allocated to those m-grams which (i) include the subject key and (ii) have no respective discounted probabilites stored therefor;
  • (d) generating γL factors, each γL factor being valued to normalize the probability distribution of only those m-grams which (i) are formed from a key of length L and (ii) are not included in a greater-included m-gram having a key of known words;
  • (e) storing for each key of length L, a value αL =βLγL and
  • (f) evaluating a likelihood of a selected word following a string of known words including the steps of:
    • (i) searching successively shorter keys of the known words until a key is found which, when followed by the at least one selected word, represents an m-gram having a discounted probability P; stored therefor, and retrieving P;
    • (ii) retrieving the stored αL value for each longer key searched before the stored m-gram is found; and
    • (iii) computing a likelihood value of the selected word following the string of known words based on the retrieved αL values and the retrieved P value.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 33 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (33)   |   Backward references (11)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Get PDF - 21pp US3188609* 1965-06 Harmon et al.    
Get PDF - 12pp US3925761  1975-12 Chaires et al.  International Business Machines Corporation Binary reference matrix for a character recognition machine
Get PDF - 50pp US3969700  1976-07 Bollinger et al.  International Business Machines Corporation Regional context maximum likelihood error correction for OCR, keyboard, and the like
Get PDF - 13pp US4038503  1977-07 Moshier  Dialog Systems, Inc. Speech recognition apparatus
Get PDF - 27pp US4156868  1979-05 Levinson  Bell Telephone Laboratories, Incorporated Syntactic word recognizer
Get PDF - 34pp US4277644  1981-07 Levinson et al.  Bell Telephone Laboratories, Incorporated Syntactic continuous speech recognizer
Get PDF - 34pp US4400788  1983-08 Myers et al.  Bell Telephone Laboratories, Incorporated Continuous speech pattern recognizer
Get PDF - 23pp US4435617  1984-03 Griggs   Speech-controlled phonetic typewriter or display device using two-tier approach
Get PDF - 24pp US4489435  1984-12 Moshier  Exxon Corporation Method and apparatus for continuous word string recognition
Get PDF - 28pp US4530110  1975-07 Nojiri et al.  Nippondenso Co., Ltd. Continuous speech recognition method and device
Get PDF - 19pp US4538234  1985-08 Honda  Nippon Telegraph & Telephone Public Corporation Adaptive predictive processing system
  * some details unavailable
       
Foreign References: None

Other Abstract Info: DERABS G87-322352

Other References:
  • Interpolation of Estimators Derived from Sparse Data, L. R. Bahl et al., , IBM Technical Disclosure Bulletin, vol. 24 No. 4, Sep. 1981.
  • Variable N-Gram Method for Statistical Language Processing, F. J. Damerau, , IBM Technical Disclosure Bulletin, vol. 24 No. 11A, Apr. 1982.
  • Probability Distribution Estimation from Sparse Data, F. Jelinek et al., , IBM Technical Disclosure Bulletin, vol. 28 No. 6, Nov. 1985.
  • Recursive Self-Smoothing of Linguistic Contingency Tables, A. J., Nadas, , IBM Technical Disclosure Bulletin, vol. 27 No. 7B, Dec. 1984.
  • Proceedings of the IEEE, vol. 73, No. 11, Nov. 1985, pp. 1616-1624 F. Jelinek: "The Development of an Experimental Discrete Dictation Recognizer". (9 pages) Cited by 4 patents


  • Inquire Regarding Licensing

    Powered by Verity


    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2014 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help