 |
 |
|
|
|
|
Title: |
US6269376:
Method and system for clustering data in parallel in a distributed-memory multiprocessor system
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Dhillon, Inderjit Singh; Berkeley, CA
Modha, Dharmendra Shantilal; San Jose, CA

|
Assignee: |
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
2001-07-31
/ 1998-10-26

|
Application Number: |
US1998000179027

|
IPC Code: |
Advanced:
G06F 17/18;
G06K 9/62;
Core:
more...
IPC-7:
G06F 17/00;

|
ECLA Code: |
G06F17/18; G06K9/62B1;

|
U.S. Class: |
Current:
707/101;
707/006;
707/102;
Original:
707/101;
707/006;
707/102;

|
Field of Search: |
707/003-6,200-206,101-103
704/009,2-8,241
725/116

|
Priority Number: |
| 1998-10-26 |
US1998000179027 |

|
Abstract: |
A method, apparatus, article of manufacture, and a memory structure for clustering data points in parallel using a distributed-memory multi-processor system is disclosed. The disclosed system has particularly advantageous application to a rapid and flexible k-means computation for data mining. The method comprises the steps of dividing a set of data points into a plurality of data blocks, initializing a set of k global centroid values in each of the data blocks k initial global centroid values, performing a plurality of asynchronous processes on the data blocks, each asynchronous process assigning each data point in each data block to the closest global centroid value within each data block, computing a set of k block accumulation values from the data points assigned to the k global centroid values, and recomputing the k global centroid values from the k block accumulation values.

|
Attorney, Agent or Firm: |
Gates & Cooper LLP ;

|
Primary / Asst. Examiners: |
Black, Thomas; Jung, David

|
INPADOC Legal Status: |
Show legal status actions

|
Family: |
None

|
First Claim:
Show all 42 claims |
What is claimed is:
1. A method of clustering a set of data points into k clusters, comprising:
- (a) dividing the set of data points into P data blocks of substantially equal size, each data block assigned to one of P processors;
- (b) selecting k initial global centroids with a first processor and broadcasting the k initial global centroids from the first processor to the remaining P-1 processors;
- (c) computing the distance from each data point in each data block to the global centroid values by using the processor associated with the data block;
- (d) assigning each data point in each data block to a global centroid value closest to the data point by using the processor associated with the data block;
- (e) computing k block accumulation values in each block from the data points assigned thereto; and
- (f) recomputing the k global centroid values from the k block accumulation values computed for each data block.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 13 U.S. patent(s) that reference this one

|
 |
 |
|
|
|
|
Foreign References: |
None

|
Other References: |
Ding et al., "Necessary conditions on minimal system configuration for general MISO Mamdani Fuzzy Systems as Universal Approximators", Systems, Man and Cybernetics, Part B, IEEE Transactions on, vol. 30, Issue 6, pp. 857-864, Dec. 2000.*
(8 pages)
[ISI abstract]
Paliwal et al., "Comments on `modified K-means algorithm for vector quantizer design`", Image Processing, IEEE Transactions on, vol. 9, Issue 11, pp. 1964-1967, Nov. 2000.*
(4 pages)
[ISI abstract]
Lee et al., "Morphology-based three-dimensional interpolation", Medical Imaging, IEEE Transactions on, vol. 19, Issue 7, pp. 711-721, Jul. 2000.*
(11 pages)
[ISI abstract]
Berger, Marsha et al., "An Algorithm for Point Clustering and Grid Generation," IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, No. 5, Sep./Oct. 1991, pp. 1278-1286.
(9 pages)
Cited by 5 patents
[ISI abstract]
Duda, Richard O. et al., "Pattern Classification and Scene Analysis," John Wiley & Sons, New York, 1973, pp. 210-257.
Ester, Martin et al., "A Database Interface for Clustering in Large Spatial Databases," Conference on Knowledge Discovery and Data Mining, KDD-95, AAAI Press, 1996, pp. 94-99.
Fisher, Douglas H., "Knowledge Acquisition Via Incremental Conceptual Clustering," pp. 267-283 (originally published in Machine Learning, No. 2:139-172); 1987.
Fukunaga, K. et al., "A Branch and Bound Algorithm for Computing .kappa.-Nearest Neighbors," IEEE Transactions on Computers, Jul. 1975, pp. 750-753.
(4 pages)
Han, Eui-Hong (Sam) et al., "Clustering Based on Associated Rule Hypergraphs," Proceedings of the Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD '97), Tucson, Arizona, 1997, pp. 9-13.
Huang, Zhexue, "A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining," SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, SIGMOD-DMKD 1997, pp. 1-8.
Keim, Daniel A., "Enhancing Visual Clustering of Query-Dependent Database Visualization Techniques Using Screen-Filling Curves," Proceedings of the Workshop on Database Issues for Data Visualization, Atlanta, Georgia, 1995, pp. 101-110.
Ketterlin, A. et al., "Conceptual Clustering in Structured Databases: a Practical Approach," Proceedings of the First International Conference on Knowledge Discovery & Data Mining, KDD-95, AAAI Press, 1996, pp. 180-185.
Ng, Raymond T. et al., "Efficient and Effective Clustering Methods for Spatial Data Mining," Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994, pp. 144-155.
Pollard, David, "Quantization and the Method of .kappa.-Means," IEEE Transactional on Information Theory, vol. IT-28, No. 2, Mar. 1982, pp. 199-205.
(7 pages)
Cited by 2 patents
Pollard, David, "A Central Limit Theorem for .kappa.-Means Clustering," The Annals of Probability, vol. 10, No. 4, 1982, pp. 919-926.
(8 pages)
Cited by 2 patents
Ripley, Brian, "Pattern Recognition & Neural Networks," Cambridge University Press, 1996, pp. 311-322.
Ralambondrainy, H., "A Conceptual Version of the .kappa.-means Algorithm," Pattern Recognition Letters 16, 1995, pp. 1147-1157.
(11 pages)
Cited by 5 patents
[ISI abstract]
Selim, Shokri Z. et al., ".kappa.-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-6, No. 1, Jan. 1984, pp. 81-87.
(7 pages)
Cited by 5 patents
Wharton, Stephen W., "A Generalized Histogram Clustering Scheme for Multidimensional Image Data," Pattern Recognition, vol. 16, No. 2, 1983, pp. 193-199.
(7 pages)
Cited by 2 patents

|


|
Nominate this for the Gallery...

|
|