 |
 |
|
|
|
|
Title: |
US6188976:
Apparatus and method for building domain-specific language models
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Ramaswamy, Ganesh N.; Ossining, NY
Printz, Harry W.; New York, NY
Gopalakrishnan, Ponani S.; Yorktown Heights, NY

|
Assignee: |
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
2001-02-13
/ 1998-10-23

|
Application Number: |
US1998000178026

|
IPC Code: |
Advanced:
G10L 15/18;
Core:
G10L 15/00;
IPC-7:
G06F 17/20;
G06F 17/27;
G10L 15/00;

|
ECLA Code: |
G10L15/18C;

|
U.S. Class: |
Current:
704/009;
704/001;
704/255;
704/E15.019;
Original:
704/009;
704/001;
704/255;

|
Field of Search: |
704/001,9-10,255,256,257,265

|
Priority Number: |
| 1998-10-23 |
US1998000178026 |

|
Abstract: |
Disclosed is a method and apparatus for building a domain-specific language model for use in language processing applications, e.g., speech recognition. A reference language model is generated based on a relatively small seed corpus containing linguistic units relevant to the domain. An external corpus containing a large number of linguistic units is accessed. Using the reference language model, linguistic units which have a sufficient degree of relevance to the domain are extracted from the external corpus. The reference language model is then updated based on the seed corpus and the extracted linguistic units. The process may be repeated iteratively until the language model is of satisfactory quality. The language building technique may be further enhanced by combining it with mixture modeling or class-based modeling.

|
Attorney, Agent or Firm: |
F. Chau & Associates, LLP ;

|
Primary / Asst. Examiners: |
Isen, Forester W.; Edouard, Patrick N.

|
INPADOC Legal Status: |
Show legal status actions

|
Family: |
None

|
First Claim:
Show all 21 claims |
What is claimed is:
1. A method for building a language model specific to a domain, comprising the steps of:
- a) building a reference language model based on a seed corpus containing linguistic units relevant to said domain;
- b) accessing an external corpus containing a large number of linguistic units;
- c) using said reference language model, selectively extracting linguistic units from said external corpus that have a sufficient degree of relevance to said domain; and
- d) updating said reference language model based on said seed corpus and said extracted linguistic units.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 33 U.S. patent(s) that reference this one

|
 |
 |
|
|
|
|
Foreign References: |
None

|
Other References: |
Placeway, P., "The Estimation of Powerful Language Models From Small and Large Corpora" IEEE 1993, pp. II-33-II-36.
Masataki et al., "Task Adaptation Using Map Estimation in N-Gram Language Modeling," IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 783-786, Munich, Apr. 1997.
Crespo et al., "Language Model Adaptation for Conversational Speech Recognition Using Automatically Tagged Pseudo-Morphological Classes," IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 823-826, Munich, Apr. 1997.
Farhat et al., "Clustering Words for Statistical Language Models Based on Contextual Word Similarity," IEEE International Conference on Acoustics, Speech Pricessing, vol. 1, pp. 180-183, Atlanta, May 1996.
Iyer et al., "Using Out-Of-Domain Data to Improve In-Domain Language Models," IEEE Signal Processing Letters, vol. 4, No. 8, pp. 221-223, Aug. 1997.
(3 pages)
Cited by 5 patents
[ISI abstract]
Issar, S., "Estimation of Language Models for New Spoken Language Applications," International Conference on Spoken Language Processing, vol. 2, pp. 869-872, Philadelphia, Oct. 1996.
Brown et al., "Class-Based n-gram Models of Natural Language," Computational Linguistics, vol. 18, No. 4, pp. 467-479, 1992.

|


|
Nominate this for the Gallery...

|
|