Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced   Derwent    Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 13pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
  Go to:  Derwent  
 Email this to a friend  Email this to a friend 
       
Title: US6260038: Clustering mixed attribute patterns
[ Derwent Title ]
>> View Certificate of Correction for this publication


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
13 pages

 
Inventor: Martin, David C.; San Jose, CA
Modha, Dharmendra Shantilal; San Jose, CA
Vaithyanathan, Shivakumar; San Jose, CA

Assignee: International Businemss Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2001-07-10 / 1999-09-13

Application Number: US1999000394883

IPC Code: Advanced: G06F 17/30; G06K 9/62;
Core: more...
IPC-7: G06F 17/30;

ECLA Code: G06F17/30S8R1; G06K9/62B1;

U.S. Class: Current: 707/007; 707/002; 707/005; 707/006; 707/101; 707/102;
Original: 707/007; 707/002; 707/005; 707/006; 707/101; 707/102;

Field of Search: 707/007,6,101,2,5,102

Priority Number:
1999-09-13  US1999000394883

Abstract:     A technique for clustering data points in a data set that is arranged as a matrix having n objects and m attributes. Each categorical attribute of the data set is converted to a 1-of-p representation of the categorical attribute. A converted data set A is formed based on the data set and the 1-of-p representation for each categorical attribute. The converted data set A is compressed using, for example, a Goal Directed Projection compression technique or a Singular Value Decomposition compression technique, to obtain q basis vectors, with q being defined to be at least m+1. The transformed data set is projected onto the q basis vectors to form a data matrix having at least one vector, with each vector having q dimensions. Lastly, a clustering technique is performed on the data matrix having vectors having q dimensions.

Attorney, Agent or Firm: Tran, Esq., Khanh Q.Banner & Witcoff, Ltd. ;

Primary / Asst. Examiners: Alam, Hosain T.; Corrielus, Jean M

Maintenance Status: E2 Expired  Check current status
CC Certificate of Correction issued
View Certificate of Correction

INPADOC Legal Status: Show legal status actions

Family: None

First Claim:
Show all 30 claims
What is claimed is:     1. A method performed by a computer for clustering data points in a data set, the data set being arranged as a matrix having n objects and m attributes, the method comprising the steps of:
  • converting each categorical attribute of the data set to a 1-of-p representation of the categorical attribute;
  • forming a converted data set A based on the data set and the 1-of-p representation for each categorical attribute;
  • compressing the converted data set A to obtain q basis vectors, with q being defined to be at least m+1;
  • projecting the converted data set onto the q basis vectors to form a data matrix having at least one vector, each vector having q dimensions; and
  • performing a clustering technique on the data matrix having vectors having q dimensions.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 8 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (8)   |   Backward references (10)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Buy PDF- 8pp US5271097  1993-12 Barker et al.  International Business Machines Corporation Method and system for controlling the presentation of nested overlays utilizing image area mixing attributes
Buy PDF- 16pp US5448727  1995-09 Annevelink  Hewlett-Packard Company Domain based partitioning and reclustering of relations in object-oriented relational database management systems
Buy PDF- 11pp US5471567  1995-11 Soderberg et al.  Bolt Beranek and Newman Inc. Image element depth buffering using two buffers
Buy PDF- 39pp US5983220  1999-11 Schmitt  Bizrate.Com Supporting intuitive decision in complex multi-attributive domains using fuzzy, hierarchical expert models
Buy PDF- 28pp US6012058  2000-01 Fayyad et al.  Microsoft Corporation Scalable system for K-means clustering of large databases
Buy PDF- 63pp US6029195  2000-02 Herz   System for customized electronic identification of desirable objects
Buy PDF- 17pp US6032146  2000-02 Chadha et al.  International Business Machines Corporation Dimension reduction for data mining application
Buy PDF- 14pp US6038574  2000-03 Pitkow et al.  Xerox Corporation Method and apparatus for clustering a collection of linked documents using co-citation analysis
Buy PDF- 19pp US6049797  2000-04 Guha et al.  Lucent Technologies, Inc. Method, apparatus and programmed medium for clustering databases with categorical attributes
Buy PDF- 14pp US6115708  2000-09 Fayyad et al.  Microsoft Corporation Method for refining the initial conditions for clustering with applications to small and large database clustering
       
Foreign References: None

Other References:
  • Chaudhuri et al., "A novel multiseed nonhierarchical data clustring technique", IEEE, vol. 27, No. 5, pp. 871-877, Oct. 1997.* (7 pages) [ISI abstract]
  • Burd et al., "Investigating component based maintenance and the effect of software evolution: a reegineering approach using data clustering", IEEE, pp. 199-206, Jan. 1997.*
  • Liu et al., "Feature selection via discretization", IEEE, vol. 9, No. 4, pp. 642-645, Jul. 1997.* (4 pages) [ISI abstract]
  • J. C. Gower, "A General Coefficient of Similarity and Some of its Properties," Biometrics 27, 857-874, Dec. 1971.
  • H. Ralambondrainy, "A conceptual version of the K-means algorithm, " Pattern Recognition Letters 16, 1995 pp. 1147-1157. (11 pages) Cited by 5 patents [ISI abstract]
  • M. Berger et al., "An Algorithm for Point Clustering and Grid Generation," IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, No. 5, 1991, pp. 1278-1286. (9 pages) Cited by 5 patents [ISI abstract]
  • D.H. Fisher, "Knowledge Acquistion Via Incremental Conceptual Clustering," Jul. 4, 1987, pp. 267-283.
  • J. MacQueen, "Some Method for Classification and Analysis of Multivariate Observations," pp. 281-297.
  • R.S. Michalski, "Chapter 4: A Theory and Methodology of Inductive Learning," Maching Learning: An Artificial Intelligent Approach, Springer, New York,. pp. 83-134.
  • Duda and Hart, "Pattern Classification and Scene Analysis," New York: Wiley, 1973, pp. 210-257.
  • M. Ester et al, "A Database Interface for Clustering in Large Spatial Databases," Conference on Knowledge Disclovery and Data, pp. 94-99.
  • S.Z. Selim et al., "K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality,", IEEE Transactios on Pattern Analysis and Machine Intelligence, vol. PAM1-6, No. 1, Jan. 1984, pp. 81-87. (7 pages) Cited by 5 patents
  • Z. Huang, "A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining," pp/ 1-8.
  • Eui-Hong Han et al., "Clustering Based on Association Rule Hypergraphs," pp. 9-13.
  • U.M. Fayyad et al., "Conceptual Clustering in Structured Databases: a Practical Approach," KDD-95 Proceedings, pp. 180-185.
  • R.T. Ng et al., "Efficient and Effective Clustering Methods for Spatial Data Mining," Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994, pp. 144-155.
  • S. D. Lee et al, "Maintenance of Discovered Association Rules: What to update?" pp. 51-58.
  • T. Zhang et al., "Birch: An Efficient Data Clustering Method for Very Large Databases," SIGMOD '96, Jun. 1996, pp. 103-114.
  • S.W. Wharton, "A generalized Histogram Clustering Scheme for Multidimensional Image Data," Pattern Recognition, vol. 16, No. 2, 1983, pp. 193-199. (7 pages) Cited by 2 patents
  • B. Ripley, "Pattern Recognition and Neural Networks," Cambridge University Press, 1996, pp. 311-322.
  • Raghavan et al., "Latent Semantic Indexing: A Probabilistic Analysis," 1998, pp. 1-15.
  • M. Zait et al, "A Comparative Study of Clustering," pp. 1-12.


  • Inquire Regarding Licensing

    Powered by Verity


    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2010 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help