Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced       Help   

 The Delphion Integrated View

  Buy Now:   Buy PDF- 16pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
 Email this to a friend  Email this to a friend 
Title: US5444617: Method and apparatus for adaptively generating field of application dependent language models for use in intelligent systems
[ Derwent Title ]

Country: US United States of America

View Images High


16 pages

Inventor: Merialdo, Bernard; Valbonne, France

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 1995-08-22 / 1993-12-14

Application Number: US1993000166777

IPC Code: Advanced: G06F 17/27; G06F 17/28; G10L 15/10; G10L 15/18; G10L 15/28;
IPC-7: G06F 15/38;

ECLA Code: G10L15/197; G06F17/27G; S10L15/183;

U.S. Class: Current: 704/009; 704/238; 704/E15.023;
Original: 364/419.1; 364/419.08; 395/002.47;

Field of Search: 364/419.01,419.02,419.03,419.08,419.19,419.10,419.11 395/2.44,2.47,2.49,2.51,2.54,2.64,2.66,2.79,2.86

Priority Number:
1992-12-17  EP1992000480198

Abstract:     A system architecture for providing human intelligible information by processing a flow of input data; e.g., converting speech (source information) into printable data (target information) based on target-dependent probabilistic models; and for enabling efficient switching from one target field of information into another. To that end, the system is provided with a language modeling device including a data base loadable with an application-dependent corpus of words and/or symbols through a workstation; and a language modeling processor programmed to refresh, in practice, a tree-organized model, efficiently, with no blocking situations, and at a reasonable cost.

Attorney, Agent or Firm: Timar, John J. ;

Primary / Asst. Examiners: Weinhardt, Robert A.;

Maintenance Status: E1 Expired  Check current status

INPADOC Legal Status: Show legal status actions          Buy Now: Family Legal Status Report

Designated Country: DE FR GB 

Family: Show 4 known family members

First Claim:
Show all 12 claims
I claim:     1. An improved method for constructing a target field dependent model in the form of a decision tree for an intelligent machine, the operation of said machine is based on statistical approaches for converting input data from a source type of information into a target type of information using said decision tree, said method including:
  • storing in a data base a set of application field dependent files including words and symbols, thereby constituting a corpus;
  • performing a vocabulary selection by deriving from said corpus, a list of most frequent words and symbols;
  • scanning said words and symbols, and deriving therefrom a plurality of frequencies of occurrence of n-grams, which are sequences of a predefined number "n" of words and symbols, and storing said plurality of frequencies into an n-grams table;
  • constructing said decision tree by:
    • a) putting all selected vocabulary words and symbols into a first unique class C, said class initially constituting the only element of a set of classes; then,
    • b) splitting each class of said set of classes into two subclasses C1 and C2, and assigning, through an iterative process, each word and symbol to one of said subclasses C1 and C2, based on the plurality of frequencies in said n-grams table;
    • c) computing for each subclass C1 and C2 word and symbol "x", a distance d1 and a distance d2 relative to each subclass C1 and C2, respectively, wherein said distances d1 and d2 are derived as follows: [Figure] wherein V is the number of words in the vocabulary, and [Figure] wherein C is a counter of all n-grams among x1, . . . Xn-1, y and where the summation is taken over all contexts (x1 . . . xn-1) such that xj =x, and NTotal is the size of the class to be partitioned, [Figure] the summation in the numerator being taken over all contexts (x1 . . . xn-1), where xj belongs to C1 ; and,
  • the summation in the denominator being taken over all contexts where xj belongs to C1 and over all possible values of z from 0 to V-1, [Figure] the summation in the numerator being taken over all contexts (x1 . . . xn-1), where xj belongs to C2 ; and,
  • the summation in the denominator being taken over all contexts (x1 . . . xn-1) where xj belongs to C2 and over all possible values of z from 0 to V-1;

  •     Φ[p]=Log2 p if p>ε

        Φ[p]=(p/ε)-1+Log2 (ε) if p<ε
  • with ε=[min p(x,y)]2
  • where the minimum is taken over all non-zero values of p(x,y), in which case,

  •     Φ[]= 2Log [min p(x,y)]-1
  • d) reclassifying "x" based on the shorter distance of d1 and d2 ; and
  • e) testing each subclass C1 and C2 and deciding based on a predefined criteria, whether each class of the set of classes should be split any further; and, in case of any further split requirement, repeating said steps b) through e) thus increasing the number of elements in said set of classes.

Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 30 U.S. patent(s) that reference this one

U.S. References: Go to Result Set: All U.S. references   |  Forward references (30)   |   Backward references (4)   |   Citation Link

Patent  Pub.Date  Inventor Assignee   Title
Get PDF - 12pp US4942526  1990-07 Okajima et al.  Hitachi, Ltd. Method and system for generating lexicon of cooccurrence relations in natural language
Get PDF - 10pp US5005203  1991-04 Ney  U.S. Philips Corporation Method of recognizing continuously spoken words
Get PDF - 15pp US5195167  1993-03 Bahl et al.  International Business Machines Corporation Apparatus and method of grouping utterances of a phoneme into context-dependent categories based on sound-similarity for automatic speech recognition
Get PDF - 8pp US5267165  1993-11 Sirat  U.S. Philips Corporation Data processing device and method for selecting data words contained in a dictionary
Foreign References:
Publication Date IPC Code Assignee   Title
Get PDF - 20pp EP0238689 1986-03  G10L 5/06 IBM Method and apparatus for performing acoustic matching in a speech recognition system 
Get PDF - 23pp EP0245595 1987-02  G10L 5/06 IBM Apparatus and method for estimating, from sparse data, the probability that a particular one of a set of events is the next event in a string of events 
Get PDF - 16pp EP0300648 1988-07  G06F 3/16 BRITISH TELECOMM Pattern recognition 
Get PDF - 12pp EP0313975 1988-10  G06F 15/36 IBM Design and construction of a binary-tree system for language modelling 
Get PDF - 33pp EP0387602 1990-02  G10L 5/06 IBM Method and apparatus for the automatic determination of phonological rules as for a continuous speech recognition system. 
Get PDF - 11pp EP0508519 1992-03  G06F 15/401 N.V. Philips' Gloeilampenfabrieken A method for storing bibliometric information on items from a finite source of text, and in particular document postings for use in a full-text document retrieval system 

Other Abstract Info: DERABS G1994-192948

Other References:
  • Speech Technology, vol. 5, No. 3, Feb. 1991, New York US pp. 96-100, Meisel et al "Efficient Representation of Speech for Recognition", p. 97, left col., paragrah 2--p. 99, left col., paragraph 1; figures 2, 3.

  • Inquire Regarding Licensing

    Powered by Verity

    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2014 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help