Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced       Help   

 The Delphion Integrated View

  Buy Now:   Buy PDF- 11pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
 Email this to a friend  Email this to a friend 
Title: US6188976: Apparatus and method for building domain-specific language models
[ Derwent Title ]

Country: US United States of America

View Images High


11 pages

Inventor: Ramaswamy, Ganesh N.; Ossining, NY
Printz, Harry W.; New York, NY
Gopalakrishnan, Ponani S.; Yorktown Heights, NY

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2001-02-13 / 1998-10-23

Application Number: US1998000178026

IPC Code: Advanced: G10L 15/18;
IPC-7: G06F 17/20; G06F 17/27; G10L 15/00;

ECLA Code: G10L15/183; S10L15/183;

U.S. Class: Current: 704/009; 704/001; 704/255; 704/E15.019;
Original: 704/009; 704/001; 704/255;

Field of Search: 704/001,9-10,255,256,257,265

Priority Number:
1998-10-23  US1998000178026

Abstract:     Disclosed is a method and apparatus for building a domain-specific language model for use in language processing applications, e.g., speech recognition. A reference language model is generated based on a relatively small seed corpus containing linguistic units relevant to the domain. An external corpus containing a large number of linguistic units is accessed. Using the reference language model, linguistic units which have a sufficient degree of relevance to the domain are extracted from the external corpus. The reference language model is then updated based on the seed corpus and the extracted linguistic units. The process may be repeated iteratively until the language model is of satisfactory quality. The language building technique may be further enhanced by combining it with mixture modeling or class-based modeling.

Attorney, Agent or Firm: F. Chau & Associates, LLP ;

Primary / Asst. Examiners: Isen, Forester W.; Edouard, Patrick N.

INPADOC Legal Status: Show legal status actions

Family: None

First Claim:
Show all 21 claims
What is claimed is:     1. A method for building a language model specific to a domain, comprising the steps of:
  • a) building a reference language model based on a seed corpus containing linguistic units relevant to said domain;
  • b) accessing an external corpus containing a large number of linguistic units;
  • c) using said reference language model, selectively extracting linguistic units from said external corpus that have a sufficient degree of relevance to said domain; and
  • d) updating said reference language model based on said seed corpus and said extracted linguistic units.

Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 69 U.S. patent(s) that reference this one

U.S. References: Go to Result Set: All U.S. references   |  Forward references (69)   |   Backward references (4)   |   Citation Link

Patent  Pub.Date  Inventor Assignee   Title
Get PDF - 16pp US5444617  1995-08 Merialdo  International Business Machines Corporation Method and apparatus for adaptively generating field of application dependent language models for use in intelligent systems
Get PDF - 29pp US5613036  1997-03 Strong  Apple Computer, Inc. Dynamic categories for a speech recognition system
Get PDF - 17pp US5640487  1997-06 Lau et al.  International Business Machines Corporation Building scalable n-gram language models using maximum likelihood maximum entropy n-gram models
Get PDF - 9pp US5899973  1999-05 Bandara et al.  International Business Machines Corporation Method and apparatus for adapting the language model's size in a speech recognition system
Foreign References: None

Other References:
  • Placeway, P., "The Estimation of Powerful Language Models From Small and Large Corpora" IEEE 1993, pp. II-33-II-36.
  • Masataki et al., "Task Adaptation Using Map Estimation in N-Gram Language Modeling," IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 783-786, Munich, Apr. 1997.
  • Crespo et al., "Language Model Adaptation for Conversational Speech Recognition Using Automatically Tagged Pseudo-Morphological Classes," IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 823-826, Munich, Apr. 1997.
  • Farhat et al., "Clustering Words for Statistical Language Models Based on Contextual Word Similarity," IEEE International Conference on Acoustics, Speech Pricessing, vol. 1, pp. 180-183, Atlanta, May 1996.
  • Iyer et al., "Using Out-Of-Domain Data to Improve In-Domain Language Models," IEEE Signal Processing Letters, vol. 4, No. 8, pp. 221-223, Aug. 1997. (3 pages) Cited by 5 patents [ISI abstract]
  • Issar, S., "Estimation of Language Models for New Spoken Language Applications," International Conference on Spoken Language Processing, vol. 2, pp. 869-872, Philadelphia, Oct. 1996.
  • Brown et al., "Class-Based n-gram Models of Natural Language," Computational Linguistics, vol. 18, No. 4, pp. 467-479, 1992.

  • Inquire Regarding Licensing

    Powered by Verity

    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2014 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help