Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced   Derwent    Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 15pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
  Go to:  Derwent  
 Email this to a friend  Email this to a friend 
       
Title: US6424971: System and method for interactive classification and analysis of data
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
15 pages

 
Inventor: Kreulen, Jeffrey Thomas; San Jose, CA
Modha, Dharmendra Shantilal; San Jose, CA
Spangler, William Scott; San Martin, CA
Strong, Jr., Hovey Raymond; San Jose, CA

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2002-07-23 / 1999-10-29

Application Number: US1999000429650

IPC Code: Advanced: G06F 17/30;
Core: more...
IPC-7: G06F 17/30;

ECLA Code: G06F17/30T4M;

U.S. Class: 707/007; 707/101; 707/006; 707/004;

Field of Search: 707/007,6,2,4,101,3,5 345/440,427 704/009,2,8 382/225,224,228 714/026,47 706/016

Priority Number:
1999-10-29  US1999000429650

Abstract:     A system, method, and computer program product for interactively classifying and analyzing data is particularly applicable to classification and analysis of textual data. It is particularly useful in identification of helpdesk inquiry and problem categories amenable to automated fulfillment or solution. A dictionary is generated based on a frequency of occurrence of words in a document set. A count of occurrences of each word in the dictionary within each document in the document set is generated. The set of documents is partitioned into a plurality of clusters. A name, a centroid, a cohesion score, and a distinctness score are generated for each cluster and displayed in a table. The documents contained in the clusters sorted based on their similarity to other documents in the cluster. The similarity may be determined by calculating the distance of the document to the centroid of the cluster and the documents may be sorted in order of ascending or descending distance of the document to the centroid of the cluster. Editing input may be received from a user and the displayed table modified based on the received editing input-clusters may be split or deleted. The helpdesk application area is only one of many areas to which the present invention may be advantageously applied. One of ordinary skill in the art would recognize that any set of text documents may be classified and subsequently analyzed using the present invention.

Attorney, Agent or Firm: Morgan & Finnegan ; Tran, Khanh Q. ;

Primary / Asst. Examiners: Shah, Sanjiv;

INPADOC Legal Status: Show legal status actions

Family: None

First Claim:
Show all 48 claims
What is claimed is:     1. A method for interactive classification and analysis comprising the steps of:
  • generating a dictionary including a subset of words contained in a document set based on a frequency of occurrence of each word in the document set;
  • generating a count of occurrences of each word in the dictionary within each document in the document set;
  • partitioning the set of documents into a plurality of clusters, each cluster containing at least one document;
  • generating a name for each cluster;
  • generating a centroid of each cluster in the space of the dictionary;
  • generating a cohesion score for each cluster;
  • generating a distinctness score for each cluster; and
  • displaying a table including the name of each cluster and the cohesion score and distinctness score for each cluster.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 13 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (13)   |   Backward references (9)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Buy PDF- 11pp US5483637  1996-01 Winokur et al.  International Business Machines Corporation Expert based system and method for managing error events in a local area network
Buy PDF- 19pp US5600791  1997-02 Carlson et al.  International Business Machines Corporation Distributed device status in a clustered system environment
Buy PDF- 35pp US5703964  1997-12 Menon et al.  Massachusetts Institute of Technology Pattern recognition system with statistical classification
Buy PDF- 10pp US5822741  1998-10 Fischthal  Lockheed Martin Corporation Neural network/conceptual clustering fraud detection architecture
Buy PDF- 18pp US5857179  1999-01 Vaithyanathan et al.  Digital Equipment Corporation Computer method and apparatus for clustering documents and automatic generation of cluster keywords
Buy PDF- 17pp US5864855  1999-01 Ruocco et al.  The United States of America as represented by the Secretary of the Army Parallel document clustering process
Buy PDF- 26pp US6100901  2000-08 Mohda et al.  International Business Machines Corporation Method and apparatus for cluster exploration and visualization
Buy PDF- 23pp US6137911  2000-10 Zhilyeav  The Dialog Corporation PLC Test classification system and method
Buy PDF- 20pp US6269376  2001-07 Dhillon et al.  International Business Machines Corporation Method and system for clustering data in parallel in a distributed-memory multiprocessor system
       
Foreign References: None

Inquire Regarding Licensing

Powered by Verity


Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

Thomson Reuters Copyright © 1997-2010 Thomson Reuters 
Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help