Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced       Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 10pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
 
 Email this to a friend  Email this to a friend 
       
Title: US6327561: Customized tokenization of domain specific text via rules corresponding to a speech recognition vocabulary
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
10 pages

 
Inventor: Smith, Maria E.; Plantation, FL
Grainger, Bernard John; Winchester, United Kingdom
Crepy, Hubert; Boulogne, France
Herzog, Martin; Griesheim, Germany
Backfried, Gerhard; Purkersdorf, Austria

Assignee: International Business Machines Corp., Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2001-12-04 / 1999-07-07

Application Number: US1999000348516

IPC Code: Advanced: G06F 17/27;
Core: more...
IPC-7: G06F 17/27; G10L 15/18;

ECLA Code: G06F17/27R2;

U.S. Class: Current: 704/009; 704/010; 704/257;
Original: 704/009; 704/010; 704/257;

Field of Search: 704/009,10,1,231,251,255,257,270 707/531,532,533,1,5,6

Priority Number:
1999-07-07  US1999000348516

Abstract:     A method for supporting customized tokenization of domain-specific text acomprises the steps of: loading domain-specific tokenization rules corresponding to the customized tokenization of the domain-specific text; tokenizing the domain-specific text using the loaded domain-specific tokenization rules; and, further tokenizing the domain-specific text using general purpose tokenization rules. The loading step of the inventive method can comprise: loading a speech recognition vocabulary; and, loading domain-specific tokenization rules corresponding to the speech recognition vocabulary. In addition, the tokenizing step can comprise identifying each substring in the domain-specific text matching a regular expression having a corresponding replacement pattern in the loaded domain-specific tokenization rules, and replacing each substring identified in the identifying step with the replacement pattern corresponding to the matched regular expression. Alternatively, the tokenizing step can comprise identifying substrings in the domain-specific text matching a regular expression having a corresponding replacement pattern in the second loaded domain-specific tokenization rules; excluding from further processing the identified substrings having a do-not-replace marker associated with the identified substring; and, replacing each non-excluded identified substring with the replacement pattern corresponding to the matched regular expression.

Attorney, Agent or Firm: Akerman Senterfitt ;

Primary / Asst. Examiners: Thomas, Joseph;

INPADOC Legal Status: Show legal status actions

Family: None

First Claim:
Show all 17 claims
What is claimed is:     1. A method for supporting customized tokenization of a segment of domain-specific text comprising the steps of:
  • loading domain-specific tokenizaticn rules corresponding to said customized tokenization of said segment of domain-specific text;
  • fully tokenizing said segment of domain-specific text using said loaded domain-specific tokenization rules; and,
  • further fully tokenizing said fully tokenized segment of domain-specific text using general purpose tokenization rules.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 38 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (38)   |   Backward references (8)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Get PDF - 19pp US4991094  1991-02 Fagan et al.  International Business Machines Corporation Method for language-independent text tokenization using a character categorization
Get PDF - 33pp US5687384  1997-11 Nagese  Fujitsu Limited Parsing system
Get PDF - 27pp US5721939  1998-02 Kaplan  Xerox Corporation Method and apparatus for tokenizing text
Get PDF - 14pp US5774888  1998-06 Light  Intel Corporation Method for characterizing a document set using evaluation surrogates
Get PDF - 51pp US5890103  1999-03 Carus  Lernout & Hauspie Speech Products N.V. Method and apparatus for improved tokenization of natural language text
Get PDF - 11pp US5937422  1999-08 Nelson et al.  The United States of America as represented by the National Security Agency Automatically generating a topic description for text and searching and sorting text by topic using the same
Get PDF - 35pp US5960384  1999-09 Brash   Method and device for parsing natural language sentences and other sequential symbolic expressions
Get PDF - 12pp US6125377  2000-09 Razin  Expert Ease Development, Ltd. Method and apparatus for proofreading a document using a computer system which detects inconsistencies in style
       
Foreign References:
Buy
PDF
Publication Date IPC Code Assignee   Title
Get PDF - 42pp EP0287310A2 1988-10  G06F 15/40 WESTINGHOUSE ELECTRIC CORP Intelligent query system 


Inquire Regarding Licensing

Powered by Verity


Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

Thomson Reuters Copyright © 1997-2014 Thomson Reuters 
Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help