 |
 |
|
|
|
|
Title: |
US6260038:
Clustering mixed attribute patterns
[ Derwent Title ]
>> View Certificate of Correction for this publication

|
Country: |
US United States of America

|
| |
Inventor: |
Martin, David C.; San Jose, CA
Modha, Dharmendra Shantilal; San Jose, CA
Vaithyanathan, Shivakumar; San Jose, CA

|
Assignee: |
International Businemss Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
2001-07-10
/ 1999-09-13

|
Application Number: |
US1999000394883

|
IPC Code: |
Advanced:
G06F 17/30;
G06K 9/62;
Core:
more...
IPC-7:
G06F 17/30;

|
ECLA Code: |
G06F17/30S8R1; G06K9/62B1;

|
U.S. Class: |
Current:
707/007;
707/002;
707/005;
707/006;
707/101;
707/102;
Original:
707/007;
707/002;
707/005;
707/006;
707/101;
707/102;

|
Field of Search: |
707/007,6,101,2,5,102

|
Priority Number: |
| 1999-09-13 |
US1999000394883 |

|
Abstract: |
A technique for clustering data points in a data set that is arranged as a matrix having n objects and m attributes. Each categorical attribute of the data set is converted to a 1-of-p representation of the categorical attribute. A converted data set A is formed based on the data set and the 1-of-p representation for each categorical attribute. The converted data set A is compressed using, for example, a Goal Directed Projection compression technique or a Singular Value Decomposition compression technique, to obtain q basis vectors, with q being defined to be at least m+1. The transformed data set is projected onto the q basis vectors to form a data matrix having at least one vector, with each vector having q dimensions. Lastly, a clustering technique is performed on the data matrix having vectors having q dimensions.

|
Attorney, Agent or Firm: |
Tran, Esq., Khanh Q.Banner & Witcoff, Ltd. ;

|
Primary / Asst. Examiners: |
Alam, Hosain T.; Corrielus, Jean M

|
Maintenance Status: |
E2 Expired Check current status CC Certificate of Correction issued View Certificate of Correction

|
INPADOC Legal Status: |
Show legal status actions

|
Family: |
None

|
First Claim:
Show all 30 claims |
What is claimed is:
1. A method performed by a computer for clustering data points in a data set, the data set being arranged as a matrix having n objects and m attributes, the method comprising the steps of:
- converting each categorical attribute of the data set to a 1-of-p representation of the categorical attribute;
- forming a converted data set A based on the data set and the 1-of-p representation for each categorical attribute;
- compressing the converted data set A to obtain q basis vectors, with q being defined to be at least m+1;
- projecting the converted data set onto the q basis vectors to form a data matrix having at least one vector, each vector having q dimensions; and
- performing a clustering technique on the data matrix having vectors having q dimensions.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 8 U.S. patent(s) that reference this one

|
 |
 |
|
|
|
|
Foreign References: |
None

|
Other References: |
Chaudhuri et al., "A novel multiseed nonhierarchical data clustring technique", IEEE, vol. 27, No. 5, pp. 871-877, Oct. 1997.*
(7 pages)
[ISI abstract]
Burd et al., "Investigating component based maintenance and the effect of software evolution: a reegineering approach using data clustering", IEEE, pp. 199-206, Jan. 1997.*
Liu et al., "Feature selection via discretization", IEEE, vol. 9, No. 4, pp. 642-645, Jul. 1997.*
(4 pages)
[ISI abstract]
J. C. Gower, "A General Coefficient of Similarity and Some of its Properties," Biometrics 27, 857-874, Dec. 1971.
H. Ralambondrainy, "A conceptual version of the K-means algorithm, " Pattern Recognition Letters 16, 1995 pp. 1147-1157.
(11 pages)
Cited by 5 patents
[ISI abstract]
M. Berger et al., "An Algorithm for Point Clustering and Grid Generation," IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, No. 5, 1991, pp. 1278-1286.
(9 pages)
Cited by 5 patents
[ISI abstract]
D.H. Fisher, "Knowledge Acquistion Via Incremental Conceptual Clustering," Jul. 4, 1987, pp. 267-283.
J. MacQueen, "Some Method for Classification and Analysis of Multivariate Observations," pp. 281-297.
R.S. Michalski, "Chapter 4: A Theory and Methodology of Inductive Learning," Maching Learning: An Artificial Intelligent Approach, Springer, New York,. pp. 83-134.
Duda and Hart, "Pattern Classification and Scene Analysis," New York: Wiley, 1973, pp. 210-257.
M. Ester et al, "A Database Interface for Clustering in Large Spatial Databases," Conference on Knowledge Disclovery and Data, pp. 94-99.
S.Z. Selim et al., "K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality,", IEEE Transactios on Pattern Analysis and Machine Intelligence, vol. PAM1-6, No. 1, Jan. 1984, pp. 81-87.
(7 pages)
Cited by 5 patents
Z. Huang, "A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining," pp/ 1-8.
Eui-Hong Han et al., "Clustering Based on Association Rule Hypergraphs," pp. 9-13.
U.M. Fayyad et al., "Conceptual Clustering in Structured Databases: a Practical Approach," KDD-95 Proceedings, pp. 180-185.
R.T. Ng et al., "Efficient and Effective Clustering Methods for Spatial Data Mining," Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994, pp. 144-155.
S. D. Lee et al, "Maintenance of Discovered Association Rules: What to update?" pp. 51-58.
T. Zhang et al., "Birch: An Efficient Data Clustering Method for Very Large Databases," SIGMOD '96, Jun. 1996, pp. 103-114.
S.W. Wharton, "A generalized Histogram Clustering Scheme for Multidimensional Image Data," Pattern Recognition, vol. 16, No. 2, 1983, pp. 193-199.
(7 pages)
Cited by 2 patents
B. Ripley, "Pattern Recognition and Neural Networks," Cambridge University Press, 1996, pp. 311-322.
Raghavan et al., "Latent Semantic Indexing: A Probabilistic Analysis," 1998, pp. 1-15.
M. Zait et al, "A Comparative Study of Clustering," pp. 1-12.

|


|
Nominate this for the Gallery...

|
|