Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced   Derwent    Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 12pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
  Go to:  Derwent  
 Email this to a friend  Email this to a friend 
       
Title: US6212532: Text categorization toolkit
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
12 pages

 
Inventor: Johnson, David B.; Cortlandt Manor, NY
Hampp-Bahamueller, Thomas; Tuebingen, Germany

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2001-04-03 / 1998-10-22

Application Number: US1998000176322

IPC Code: Advanced: G06F 17/30;
Core: more...
IPC-7: G06F 17/20;

ECLA Code: G06F17/30T4M;

U.S. Class: Current: 715/236; 707/003; 707/E17.091; 715/276;
Original: 707/500; 707/003;

Field of Search: 707/500,501,530,3,4,5 382/176

Priority Number:
1998-10-22  US1998000176322

Abstract:     A module information extraction system capable of extracting information from natural language documents. The system includes a plurality of interchangeable modules including a data preparation module for preparing a first set of raw data having class labels to be tested, the data preparation module being selected from a first type of the interchangeable modules. The system further includes a feature extraction module for extracting features from the raw data received from the data preparation module and storing the features in a vector format, the feature extraction module being selected from a second type of the interchangeable modules. A core classification module is also provided for applying a learning algorithm to the stored vector format and producing therefrom a resulting classifier, the core classification module being selected from a third type of the interchangeable modules. A testing module compares the resulting classifier to a set of preassigned classes, where the testing module is selected from a fourth type of the interchangeable modules, where the testing module tests a second set of raw data having class labels received by the data preparation module to determine the degree to which the class labels of the second set of raw data approximately corresponds to the resulting classifier.

Attorney, Agent or Firm: McGuireWoods, LLP ; Kaufman, Esq., Stephen C. ;

Primary / Asst. Examiners: Hong, Stephen S.;

Maintenance Status: E2 Expired  Check current status

INPADOC Legal Status: Show legal status actions

Family: None

First Claim:
Show all 16 claims
Having thus described our invention, what we claim as new and desire to secure by Letters Patent is as follows:     1. A module information extraction system capable of extracting information from natural language documents, the system including a plurality of interchangeable modules, the system comprising:
  • a data preparation module for preparing a first set of raw data having class labels to be tested, the data preparation module being selected from a first type of the interchangeable modules;
  • a feature extraction module for extracting features from the raw data received from the data preparation module and storing the features in a vector format, the feature extraction module being selected from a second type of the interchangeable modules;
  • a core classification module for applying a learning algorithm to the stored vector format and producing therefrom a resulting classifier, the core classification module being selected from a third type of the interchangeable modules; and
  • a testing module for comparing the resulting classifier to a set of preassigned classes, the testing module being selected from a fourth type of the interchangeable modules,
  • wherein the testing module tests a second set of raw data having class labels received by the data preparation module to determine whether the class labels of the second set of raw corresponds to the resulting classifier.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 19 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (19)   |   Backward references (7)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Buy PDF- 22pp US5050222  1991-09 Lee  Eastman Kodak Company Polygon-based technique for the automatic classification of text and graphics components from digitized paper-based forms
Buy PDF- 22pp US5117349  1992-05 Tirfing et al.  Sun Microsystems, Inc. User extensible, language sensitive database system
Buy PDF- 20pp US5371807  1994-12 Register et al.  Digital Equipment Corporation Method and apparatus for text classification
Buy PDF- 21pp US5873056  1999-02 Liddy et al.  The Syracuse University Natural language processing system for semantic vector representation which accounts for lexical ambiguity
Buy PDF- 18pp US6047277  2000-04 Parry et al.   Self-organizing neural network for plain text categorization
Buy PDF- 10pp US6105023  2000-08 Callan  Dataware Technologies, Inc. System and method for filtering a document stream
Buy PDF- 23pp US6137911  2000-10 Zhilyaev  The Dialog Corporation PLC Test classification system and method
       
Foreign References: None

Other References:
  • Salton et al., "Automatic structuring and retrieval oflarge text files"; Commun. ACM 37, 2 (Feb. 1994), pp. 97-108.* (12 pages) Cited by 13 patents [ISI abstract]
  • Riloff, "Using cases to represent context for text classification"; Proceedings of the second international conference on Information and knowledge management,1993, pp. 105-113.*
  • Hoch, "Using IR techniques for text classification in document analysis";Proceedings of the seventeenth annual international ACM-SIGIR conference on Research and development in information retrieval, 1994, pp. 31-40.


  • Inquire Regarding Licensing

    Powered by Verity


    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2010 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help