 |
 |
|
|
|
|
Title: |
US6424971:
System and method for interactive classification and analysis of data
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Kreulen, Jeffrey Thomas; San Jose, CA
Modha, Dharmendra Shantilal; San Jose, CA
Spangler, William Scott; San Martin, CA
Strong, Jr., Hovey Raymond; San Jose, CA

|
Assignee: |
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
2002-07-23
/ 1999-10-29

|
Application Number: |
US1999000429650

|
IPC Code: |
Advanced:
G06F 17/30;
Core:
more...
IPC-7:
G06F 17/30;

|
ECLA Code: |
G06F17/30T4M;

|
U.S. Class: |
707/007;
707/101;
707/006;
707/004;

|
Field of Search: |
707/007,6,2,4,101,3,5
345/440,427
704/009,2,8
382/225,224,228
714/026,47
706/016

|
Priority Number: |
| 1999-10-29 |
US1999000429650 |

|
Abstract: |
A system, method, and computer program product for interactively classifying and analyzing data is particularly applicable to classification and analysis of textual data. It is particularly useful in identification of helpdesk inquiry and problem categories amenable to automated fulfillment or solution. A dictionary is generated based on a frequency of occurrence of words in a document set. A count of occurrences of each word in the dictionary within each document in the document set is generated. The set of documents is partitioned into a plurality of clusters. A name, a centroid, a cohesion score, and a distinctness score are generated for each cluster and displayed in a table. The documents contained in the clusters sorted based on their similarity to other documents in the cluster. The similarity may be determined by calculating the distance of the document to the centroid of the cluster and the documents may be sorted in order of ascending or descending distance of the document to the centroid of the cluster. Editing input may be received from a user and the displayed table modified based on the received editing input-clusters may be split or deleted. The helpdesk application area is only one of many areas to which the present invention may be advantageously applied. One of ordinary skill in the art would recognize that any set of text documents may be classified and subsequently analyzed using the present invention.

|
Attorney, Agent or Firm: |
Morgan & Finnegan ;
Tran, Khanh Q. ;

|
Primary / Asst. Examiners: |
Shah, Sanjiv;

|
INPADOC Legal Status: |
Show legal status actions

|
Family: |
None

|
First Claim:
Show all 48 claims |
What is claimed is:
1. A method for interactive classification and analysis comprising the steps of:
- generating a dictionary including a subset of words contained in a document set based on a frequency of occurrence of each word in the document set;
- generating a count of occurrences of each word in the dictionary within each document in the document set;
- partitioning the set of documents into a plurality of clusters, each cluster containing at least one document;
- generating a name for each cluster;
- generating a centroid of each cluster in the space of the dictionary;
- generating a cohesion score for each cluster;
- generating a distinctness score for each cluster; and
- displaying a table including the name of each cluster and the cohesion score and distinctness score for each cluster.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 13 U.S. patent(s) that reference this one

|