 |
 |
|
|
|
|
Title: |
US5787274:
Data mining method and system for generating a decision tree classifier for data records based on a minimum description length (MDL) and presorting of records
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Agrawal, Rakesh; San Jose, CA
Mehta, Manish; San Jose, CA
Rissanen, Jorma Johannes; Los Gatos, CA

|
Assignee: |
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
1998-07-28
/ 1995-11-29

|
Application Number: |
US1995000564694

|
IPC Code: |
Advanced:
G06F 17/30;
Core:
more...
IPC-7:
G06F 17/30;

|
ECLA Code: |
G06F17/30T4; G06F17/30T1P3;

|
U.S. Class: |
Current:
707/102;
707/100;
707/101;
707/E17.087;
707/E17.089;
Original:
395/613;
395/611;
395/612;

|
Field of Search: |
395/613,611,600,612

|
Priority Number: |
| 1995-11-29 |
US1995000564694 |

|
Abstract: |
A method and apparatus are disclosed for generating a decision tree classifier from a training set of records. The method comprises the steps of: pre-sorting the records based on each numeric record attribute, creating a decision tree breadth-first, and pruning the tree based on the MDL principle. Preferably, the pre-sorting includes generating a class list and attribute lists, and independently sorting the numeric attribute lists. The growing of the tree includes evaluating possible splitting criteria and selecting a splitting test for each leaf node, based on a splitting index, and updating the class list to reflect new leaf nodes. In a preferred embodiment, the splitting index is a gini index. The pruning preferably includes encoding the decision tree and splitting tests in an MDL-based code, and determining whether to convert a node into a leaf node, prune its child nodes, or leave the node intact, based on the code length of the node.

|
Attorney, Agent or Firm: |
Tran, Khanh Q. ;

|
Primary / Asst. Examiners: |
Black, Thomas G.; Robinson, Greta L.

|
INPADOC Legal Status: |
Show legal status actions

|
Family: |
None

|
First Claim:
Show all 42 claims |
What is claimed is:
1. A method for generating a decision tree classifier from a training set of records, each record having at least one numeric attribute and a class label of a class to which the record belongs, the method comprising the steps of:
- pre-sorting the records based on the numeric attribute;
- generating an attribute list for each attribute and a class list, each entry of the class list corresponding to a class label, the attribute list including values of the attribute and indices to the class labels;
- creating a decision tree from the pre-sorted records, attribute lists, and class list, using a breadth-first process, the decision tree having a root node, a plurality of interior nodes, and a plurality of leaf nodes, the breadth-first process being such that the nodes of a same depth from the root node are formed in parallel; and
- pruning the decision tree based on a Minimum Description Length (MDL) scheme to obtain the decision tree classifier, the MDL scheme encoding the decision tree as a model such that an encoding cost for describing the decision tree and the training set is minimized, and the method taking into account the encoding cost in the pruning.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 54 U.S. patent(s) that reference this one

|
 |
 |
|
|
|
|
Foreign References: |
None

|
Other Abstract Info: |
DERABS G98-436890
DERG98-436890

|
Other References: |
Agrawal et al., "Quest: A project on Database Mining", SIGMOD Conference http://sunsite.ust.hkdblp/db/conf/sigmod/sigmod94-514.html, pp.1-2, 1994.
Shani et al., Fundamentals Of Data Structures, chapter 6 section 6.2, pp. 292-301, 1983.
R. Agrawal et al., An Interval Classifier for Database Mining Applications, Proceedings of the 18th VLDB Conference Vancouver, British Columbia, Aug. 1992.
R. Agrawal et al., Database Mining: A Performance Perspective, IEEE Transactions on Knowledge and Data Engineering, vol. 5, No. 6, pp. 914-925, Special Issue on Learning and Discovery in Knowledge-Based Databases, Dec. 1993.
(12 pages)
Cited by 35 patents
[ISI abstract]
L. Breiman (Univ. of CA-Berkeley) et al. Classification and Regression Trees (Book) Chapter 2. Introduction to Tree Classification pp. 18-58, Wadsworth International Group, Belmont, CA 1984.
J. Catlett, Megainduction: Machine Learning on Very Large Databases, PhD thesis, Univ. of Syndey, Jun./Dec. 1991.
P. K. Chan et al., Experiments on Multistrategy Learning by Meta-learning. In Proc. Second Intl. Conf. on Info. and Knowledge Mgmt., pp. 314-323, 1993.
S. J. Hong, R-MINI: A heuristic Algorithm for Generating Minimal Rules from Examples. In 3rd Pacific Rim Int'l Conference on Artificial Intelligence, pp. 331-337, Aug. 1994.
U. Fayyad et al., The Attribute, Selection Problem in Decision Tree Generation. In 105h Nat'l Conf. on AI AAAI-92, Learning: Inductive 1992.
M. James, Classification Algorithms (book), Chapters 1-3, QA278.65, J281 Wiley-Interscience Pub., 1985.
M. Mehta et al., Mdl-based Decision Tree Pruning. Int'l Conference on Knowledge Discovery in Databases and Data Mining (KDD-95) Montreal, Canada, pp. 216-221, Aug. 1995.
J. R. Quinlan et al., Inferring Decision Trees Using Minimum Description Length Principle, Information and Computation 80, pp. 227-248, 1989. (0890-5401/89 Academic Press, Inc.).
(22 pages)
Cited by 15 patents
Wallace et al., Coding Decision Trees, Machine Learning, 11, pp. 7-22, 1993. (Kluwer Academic Pub., Boston. Mfg. in the Netherlands.).
(16 pages)
Cited by 4 patents
[ISI abstract]
S. M. Weiss et al., Computer Systems that Learn, Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, pp. 113-143, 1991. Q325.5, W432, C2, Morgan Kaufman Pub. Inc., San Mateo, CA.
S. Murthy et al., Decision Tree Induction: How Effective is the Greed Heuristic? Int'l Conference on Knowledge Discovery in Databases and Data Mining (KDD 95) Montreal, Canada, pp. 222-227.
No. 08/436,794, filed Apr. 14, 1994, for System and Method for Mining Generalized Association Rules in Databases U.S. Pat. No. 5,615,341 (presently unavailable to view).

|


|
Nominate this for the Gallery...

|
|