Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced   Derwent    Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 14pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
  Go to:  Derwent  
 Email this to a friend  Email this to a friend 
       
Title: US6560597: Concept decomposition using clustering
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
14 pages

 
Inventor: Dhillon, Inderjit Singh; Austin, TX
Modha, Dharmendra Shantilal; San Jose, CA

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2003-05-06 / 2000-03-21

Application Number: US2000000528941

IPC Code: Advanced: G06F 17/30;
Core: more...
IPC-7: G06F 17/30;

ECLA Code: G06F17/30T4M;

U.S. Class: 707/004; 707/006; 707/102;

Field of Search: 707/004,5,3,102,6 709/201

Priority Number:
2000-03-21  US2000000528941

Abstract:     A system and method operates with a document collection in which documents are represented as normalized document vectors. The document vector space is partitioned into a set of disjoint clusters and a concept vector is computed for each partition, the concept vector comprising the mean vector of all the documents in each partition. Documents are then reassigned to the cluster having their closest concept vector, and a new set of concept vectors for the new partitioning is computed. From an initial partitioning, the concept vectors are iteratively calculated to a stopping threshold value, leaving a concept vector subspace of the document vectors. The documents can then be projected onto the concept vector subspace to be represented as a linear combination of the concept vectors, thereby reducing the dimensionality of the document space. A search query can be received for the content of text documents and a search can then be performed on the projected document vectors to identify text documents that correspond to the search query.

Attorney, Agent or Firm: Hall, David A.Heller Ehrman White & McAuliffe ;

Primary / Asst. Examiners: Mizrahi, Diane D.;

INPADOC Legal Status: Show legal status actions

Family: None

First Claim:
Show all 27 claims
We claim:     1. A method of operating a computer system to represent text documents stored in a database collection, comprising:
  • representing the text documents in a vector representation format in which there are n documents and d words;
  • normalizing the document vectors;
  • determining an initial partitioning of the normalized document vectors comprising a set of k disjoint clusters and determining k cluster vectors, wherein a cluster vector comprises a mean vector of all the normalized document vectors in a partition;
  • computing a set of K concept vectors based on the initial set of cluster vectors, wherein the concept vectors define a subspace of the document vector space and wherein the subspace spans a part of the document vector space; and
  • projecting each document vector onto the subspace defined by the concept vectors, thereby defining a set of document concept decomposition vectors that represent the document vector space, with a reduced dimensionality.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 24 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (24)   |   Backward references (13)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Buy PDF- 7pp US4674028  1987-06 Shioya et al.  Hitachi, Ltd. Identification method of a process parameter
Buy PDF- 20pp US5317507  1994-05 Gallant   Method for document retrieval and for word sense disambiguation using neural networks
Buy PDF- 53pp US5692100  1997-11 Tsuboka et al.  Matsushita Electric Industrial Co., Ltd. Vector quantizer
Buy PDF- 11pp US5692107  1997-11 Simoudis et al.  Lockheed Missiles & Space Company, Inc. Method for generating predictive models in a computer system
Buy PDF- 25pp US5748116  1998-05 Chui et al.  Teralogic, Incorporated System and method for nested split coding of sparse data sets
Buy PDF- 20pp US5787274  1998-07 Agrawal et al.  International Business Machines Corporation Data mining method and system for generating a decision tree classifier for data records based on a minimum description length (MDL) and presorting of records
Buy PDF- 17pp US5787422  1998-07 Turkey et al.  Xerox Corporation Method and apparatus for information accesss employing overlapping clusters
Buy PDF- 5pp US5794235  1998-08 Chess  International Business Machines Corporation System and method for dynamic retrieval of relevant information by monitoring active data streams
Buy PDF- 25pp US5886651  1999-03 Chui et al.  Teralogic, Inc. System and method for nested split coding of sparse data sets
Buy PDF- 22pp US5893100  1999-04 Chui et al.  Teralogic, Incorporated System and method for tree ordered coding of sparse data sets
Buy PDF- 17pp US5999927  1999-12 Turkey et al.  Xerox Corporation Method and apparatus for information access employing overlapping clusters
Buy PDF- 16pp US6356898  2002-03 Cohen et al.  International Business Machines Corporation Method and system for summarizing topics of documents browsed by a user
Buy PDF- 22pp US6360227  2002-03 Aggarwal et al.  International Business Machines Corporation System and method for generating taxonomies with applications to content-based recommendations
       
Foreign References: None

Other Abstract Info: DERABS C2003-539858

Other References:
  • Drineas et al., "Clustering in large graphs and matrices," SODA pp. 291-299 (1999).
  • Duda et al., "Pattern Classification and Scene Analysis," John Wiley & Sons pp. 211-228 and pp. 252-256 (1973).
  • Kleinberg et al., "A Microeconomic View of Data Mining," Department of Computer Science, Cornell University pp. 1-14 (1998).
  • Sahami et al., Real-time Full-text Clustering of Networked Documents, Stanford University, pp. 1-3 (1998).


  • Inquire Regarding Licensing

    Powered by Verity


    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2010 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help