 |
 |
|
|
|
|
Title: |
US6230151:
Parallel classification for data mining in a shared-memory multiprocessor system
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Agrawal, Rakesh; San Jose, CA
Ho, Ching-Tien; San Jose, CA
Zaki, Mohammed J.; Rochester, NY

|
Assignee: |
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
2001-05-08
/ 1998-04-16

|
Application Number: |
US1998000061808

|
IPC Code: |
Advanced:
G06F 9/45;
G06F 17/30;
Core:
more...
IPC-7:
G06F 15/18;

|
ECLA Code: |
G06F17/30S8R1; G06F9/45M3;

|
U.S. Class: |
Current:
706/012;
707/101;
Original:
706/012;
706/012;
707/101;

|
Field of Search: |
395/706
705/006
706/049,50,12
707/001,101

|
Priority Number: |
| 1998-04-16 |
US1998000061808 |

|
Abstract: |
A method and system for generating a decision-tree classifier in parallel in a shared-memory multiprocessor system is disclosed. The processors first generate in the shared memory an attribute list for each record attribute. Each attribute list is assigned to a processor. The processors independently determine the best splits for their respective assigned lists, and cooperatively determine a global best split for all attribute lists. The attribute lists are reassigned to the processors and split according to the global best split into the lists for child nodes. The split attribute lists are again assigned to the processors and the process is repeated for each new child node until each attribute list for the new child nodes includes only tuples of the same record class or a fixed number of tuples.

|
Attorney, Agent or Firm: |
Tran, Khanh Q. ;
McSwain, Marc D. ;

|
Primary / Asst. Examiners: |
Lintz, Paul R.; Khatri, Anil

|
Maintenance Status: |
E2 Expired Check current status

|
INPADOC Legal Status: |
Show legal status actions

|
Family: |
None

|
First Claim:
Show all 51 claims |
What is claimed is:
1. A method for generating a decision-tree classifier in a shared-memory multiprocessor system from a set of records, the tree having a plurality of nodes, the method comprising the steps of:
- (a) generating cooperatively by the processors, in the shared memory, an attribute list for each attribute of the records, the attribute lists corresponding a current node and including tuples each having information on a record class;
- (b) assigning each attribute list of the current node to one of the processors;
- (c) each processor accessing the attribute lists assigned to the processor, in the shared memory, to determine a best split for each attribute list;
- (d) the processors cooperatively determining, through the shared memory, a global best split for all the attribute lists associated with the current node;
- (e) reassigning each attribute list of the current node to one of the processors;
- (f) each processor splitting the attribute lists reassigned to the processor according to the global best split into new attribute lists, the new lists corresponding to child nodes of the current node and residing in the shared memory; and
- (g) repeating steps (b)-(f) with each newly created child node as the current node, until each attribute list for the newly created child nodes includes only tuples of the same record class.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 23 U.S. patent(s) that reference this one

|
 |
 |
|
|
|
|
Foreign References: |
None

|
Other References: |
Holt et al., "Efficient mining of association rules in text databases", CIKM ACM, pp 234-242, Jan. 1999.*
Anand et al, "The role of domain kowledge in data mining", CIKM ACM pp 37-43, Jun. 1995.*
Cromp et al., "Data mining of multidimensional remotely sensed images", CIKM ACM, pp 471-480, Nov. 1993.*
Shafer et al, "Sprint a scalable parallel classifier for data mining", Proc. of the 22nd VLDN conf., pp 544-555, 1996.*
Oguchi et al, "Dynamic remote memory acquistion for parallel mining on ATM connecetd PC cluster", ACM ICS, pp 246-252, Jan. 1991.*
Weiss, "Strip mining on SIMD architecture", ACM pp 234-243, Jan. 1991.*
Goil et al, "High performance multidimensional analysis of large datasets", ACM DOLAP, pp 34-39, Aug. 1998.*
Muller et al, "A high perfromnce multi structure file system design", ACM pp 56-67, Mar. 1991.*
Callahan et al, "Parallel implementation of a frontal finite element solver on multiple platform", ACM SAC pp 491-495, Apr. 1999.*
Kennedy et al, "Optimizing for parallelism and data locality", ACM ICS pp 323-334, Jun. 1992.*
Agrawl et al, "Automatic subspace clustring of high dimensional data for data mining applications", ACM SIGMOD, pp 94-105, May 1998.*
Li et al, "Free parallel data mining", ACM SIGMOD, pp 541-543, May 1998.*
Shintani et al, "Parallel mining algorithms for genearlized association rules with classification hierarchy", ACM SIGMOD pp 25-36, May 1998.*
Zaki et al, "Parallel classification for data mining on shared memeory multiprocessors", IEEE, pp 198-205, 1999.*
Zaki et al, "Evaluation of sampling for data mining of association rules", IEEE pp 42-50, 1997.*
Agrawal, "Parallel mining of association rules", IEEE, vol. 8, No. 6, pp 962-969, Dec. 1996.*
Park et al, "Efficient parallel data mining for association rules", ACM CIKM, pp 31-36, Jun. 1995.*
Zaki et al., "A localized algorithm for parallel association mining", ACM SPAA, pp 321-330, 1996.

|


|
Nominate this for the Gallery...

|
|