





Title: 
US5263117:
Method and apparatus for finding the best splits in a decision tree for a language model for a speech recognizer
[ Derwent Title ]

Country: 
US United States of America

 
Inventor: 
Nadas, Arthur J.; Rock Tavern, NY
Nahamoo, David; White Plains, NY

Assignee: 
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

Published / Filed: 
19931116
/ 19891026

Application Number: 
US1989000427420

IPC Code: 
Advanced:
G06T 7/00;
G10L 11/00;
G10L 15/10;
G10L 15/18;
IPC7:
G10L 9/02;

ECLA Code: 
G10L15/197;

U.S. Class: 
Current:
704/200;
704/E15.023;
Original:
395/002;

Field of Search: 
381/04146
364/513.5

Priority Number: 
19891026 
US1989000427420 

Abstract: 
A method and apparatus for finding the best or near best binary classification of a set of observed events, according to a predictor feature X so as to minimize the uncertainty in the value of a category feature Y. Each feature has three or more possible values. First, the predictor feature value and the category feature value of each event is measured. The events are then split, arbitrarily, into two sets of predictor feature values. From the two sets of predictor feature values, an optimum pair of sets of category feature values is found having the lowest uncertainty in the value of the predictor feature. From the two optimum sets of category feature values, an optimum pair of sets is found having the lowest uncertainty in the value of the category feature. An event is then classified according to whether its predictor feature value is a member of a set of optimal predictor feature values.

Attorney, Agent or Firm: 
Schechter, Marc D. ;

Primary / Asst. Examiners: 
Fleming, Michael R.; Doerrler, Michelle

INPADOC Legal Status: 
Show legal status actions
Family Legal Status Report

Designated Country: 
AT BE CH DE ES FR GB IT LI NL SE

Family: 
Show 8 known family members

First Claim:
Show all 2 claims 
We claim:
1. A method of automatic speech recognition comprising the steps of:
 converting an utterance into an utterance signal representing the utterance, said utterance comprising a series of at least a predictor word and a predicted word, said utterance signal comprising at least one predictor word signal representing the predictor word;
 providing a set of M predictor feature signals, each predictor feature signal having a predictor feature value X_{m}, where M is an integer greater than or equal to three and m is an integer greater than zero and lens than or equal to M, each predictor feature signal in the set representing a different word;
 generating a decision set which contains a subset of the M predictor feature signals representing the words;
 comparing the predictor word signal with the predictor feature signals in the decision set;
 outputting a first category feature signal representing a first predicted word if the predictor word signal is a member of the decision set, said first category feature signal being one of N category feature signals, each category feature signal representing a different word and having a category feature value Y_{n}, where N is an integer greater than or equal to three, and n is an integer greater than zero and less than or equal to N; and
 outputting a second category feature signal, different from the first category feature signal and representing a second predicted word different from the first predicted word if the predictor word signal is not a member of the decision set;
 characterized in that the contents of the decision set are generated by the steps of:
 providing a training text comprising a set of observed events, each event having a predictor feature X representing a predictor word and a category feature Y representing a predicted word, said predictor feature having one of M different possible values X_{m}, each X_{m} representing a different predictor word, said category feature having one of N possible values Y_{n}, each Y_{n} representing a different predicted word;
 (a) measuring the predictor feature value X_{m} and the category feature value Y_{n} of each event in the set of events;
 (b) estimating, from the measured predictor feature values and the measured category feature values, the probability P(X_{m}, Y_{n)} of occurrence of an event having a category feature value Y_{n} and a predictor feature value X_{m}, for each Y_{n} and each X_{m} ;
 (c) selecting a starting set SX_{opt} (t) of predictor feature values X_{m}, where t has an initial value;
 (d) calculating, from the estimated probabilities P(X_{m}, Y_{n)}, the conditional probability P(SX_{opt} (t)Y_{n)} that the predictor feature has a value in the set SX_{opt} (t) when the category feature has a value Y_{n}, for each Y_{n} ;
 (e) defining a number of pairs of sets SY_{j} (t) and SY_{j} (t) of category feature values Y_{n}, where j is an integer greater than zero and less than or equal to (N1), each set SY_{j} (t) containing only those category feature values Y_{n} having the j lowest values of P(SX_{opt} (t)Y_{n)}, each set SY_{j} (t) containing only those category feature values Y_{n} having the (Nj) highest values of P(SX_{opt} (t)Y_{n)};
 (f) finding a pair of sets SY_{opt} (t) and SY_{opt} (t) from among the pairs of sets SY_{j} (t) and SY_{j} (t) such that the pair of sets SY_{opt} (t) and SY_{opt} (t) have the lowest uncertainty in the value of the predictor feature;
 (g) calculating, from the estimated probabilities P(X_{m}, Y_{n)}, the conditional probability P(SY_{opt} (t)X_{m)} that the category feature has a value in the set SY_{opt} (t) when the predictor feature has a value X_{m}, for each X_{m} ;
 (h) defining a number of pairs of sets SX_{i} (t+1) and SX_{i} (t+1) of predictor feature values X_{m}, where i is an integer greater than zero and less than or equal to (M1), each set SX_{i} (t+1) containing only those predictor feature values X_{m} having the i lowest values of P(SY_{opt} (t)X_{m)}, each set SX_{i} (t+1) containing only those predictor feature values X_{m} having the (Mi) highest values of P(SY_{opt} (t)X_{m)};
 (i) finding a pair of sets SX_{opt} (t+1) and SX_{opt} (t+1) from among the pairs of sets SX_{i} (t+1) and SX_{i} (t+1) such that the pair of sets SX_{opt} (t+1) and SX_{opt} (t+1) have the lowest uncertainty in the value of the category feature; and
 (1) setting the decision set equal to the set SX_{opt} (t+1).

Background / Summary: 
Show background / summary

Drawing Descriptions: 
Show drawing descriptions

Description: 
Show description

Forward References: 
Show 23 U.S. patent(s) that reference this one

