 |
 |
|
|
|
|
Title: |
US6212532:
Text categorization toolkit
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Johnson, David B.; Cortlandt Manor, NY
Hampp-Bahamueller, Thomas; Tuebingen, Germany

|
Assignee: |
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
2001-04-03
/ 1998-10-22

|
Application Number: |
US1998000176322

|
IPC Code: |
Advanced:
G06F 17/30;
Core:
more...
IPC-7:
G06F 17/20;

|
ECLA Code: |
G06F17/30T4M;

|
U.S. Class: |
Current:
715/236;
707/003;
707/E17.091;
715/276;
Original:
707/500;
707/003;

|
Field of Search: |
707/500,501,530,3,4,5
382/176

|
Priority Number: |
| 1998-10-22 |
US1998000176322 |

|
Abstract: |
A module information extraction system capable of extracting information from natural language documents. The system includes a plurality of interchangeable modules including a data preparation module for preparing a first set of raw data having class labels to be tested, the data preparation module being selected from a first type of the interchangeable modules. The system further includes a feature extraction module for extracting features from the raw data received from the data preparation module and storing the features in a vector format, the feature extraction module being selected from a second type of the interchangeable modules. A core classification module is also provided for applying a learning algorithm to the stored vector format and producing therefrom a resulting classifier, the core classification module being selected from a third type of the interchangeable modules. A testing module compares the resulting classifier to a set of preassigned classes, where the testing module is selected from a fourth type of the interchangeable modules, where the testing module tests a second set of raw data having class labels received by the data preparation module to determine the degree to which the class labels of the second set of raw data approximately corresponds to the resulting classifier.

|
Attorney, Agent or Firm: |
McGuireWoods, LLP ;
Kaufman, Esq., Stephen C. ;

|
Primary / Asst. Examiners: |
Hong, Stephen S.;

|
Maintenance Status: |
E2 Expired Check current status

|
INPADOC Legal Status: |
Show legal status actions

|
Family: |
None

|
First Claim:
Show all 16 claims |
Having thus described our invention, what we claim as new and desire to secure by Letters Patent is as follows:
1. A module information extraction system capable of extracting information from natural language documents, the system including a plurality of interchangeable modules, the system comprising:
- a data preparation module for preparing a first set of raw data having class labels to be tested, the data preparation module being selected from a first type of the interchangeable modules;
- a feature extraction module for extracting features from the raw data received from the data preparation module and storing the features in a vector format, the feature extraction module being selected from a second type of the interchangeable modules;
- a core classification module for applying a learning algorithm to the stored vector format and producing therefrom a resulting classifier, the core classification module being selected from a third type of the interchangeable modules; and
- a testing module for comparing the resulting classifier to a set of preassigned classes, the testing module being selected from a fourth type of the interchangeable modules,
- wherein the testing module tests a second set of raw data having class labels received by the data preparation module to determine whether the class labels of the second set of raw corresponds to the resulting classifier.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 19 U.S. patent(s) that reference this one

|
|