Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced   Derwent    Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 20pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
  Go to:  Derwent  
 Email this to a friend  Email this to a friend 
       
Title: US5787274: Data mining method and system for generating a decision tree classifier for data records based on a minimum description length (MDL) and presorting of records
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
20 pages

 
Inventor: Agrawal, Rakesh; San Jose, CA
Mehta, Manish; San Jose, CA
Rissanen, Jorma Johannes; Los Gatos, CA

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 1998-07-28 / 1995-11-29

Application Number: US1995000564694

IPC Code: Advanced: G06F 17/30;
Core: more...
IPC-7: G06F 17/30;

ECLA Code: G06F17/30T4; G06F17/30T1P3;

U.S. Class: Current: 707/102; 707/100; 707/101; 707/E17.087; 707/E17.089;
Original: 395/613; 395/611; 395/612;

Field of Search: 395/613,611,600,612

Priority Number:
1995-11-29  US1995000564694

Abstract:     A method and apparatus are disclosed for generating a decision tree classifier from a training set of records. The method comprises the steps of: pre-sorting the records based on each numeric record attribute, creating a decision tree breadth-first, and pruning the tree based on the MDL principle. Preferably, the pre-sorting includes generating a class list and attribute lists, and independently sorting the numeric attribute lists. The growing of the tree includes evaluating possible splitting criteria and selecting a splitting test for each leaf node, based on a splitting index, and updating the class list to reflect new leaf nodes. In a preferred embodiment, the splitting index is a gini index. The pruning preferably includes encoding the decision tree and splitting tests in an MDL-based code, and determining whether to convert a node into a leaf node, prune its child nodes, or leave the node intact, based on the code length of the node.

Attorney, Agent or Firm: Tran, Khanh Q. ;

Primary / Asst. Examiners: Black, Thomas G.; Robinson, Greta L.

INPADOC Legal Status: Show legal status actions

Family: None

First Claim:
Show all 42 claims
What is claimed is:     1. A method for generating a decision tree classifier from a training set of records, each record having at least one numeric attribute and a class label of a class to which the record belongs, the method comprising the steps of:
  • pre-sorting the records based on the numeric attribute;
  • generating an attribute list for each attribute and a class list, each entry of the class list corresponding to a class label, the attribute list including values of the attribute and indices to the class labels;
  • creating a decision tree from the pre-sorted records, attribute lists, and class list, using a breadth-first process, the decision tree having a root node, a plurality of interior nodes, and a plurality of leaf nodes, the breadth-first process being such that the nodes of a same depth from the root node are formed in parallel; and
  • pruning the decision tree based on a Minimum Description Length (MDL) scheme to obtain the decision tree classifier, the MDL scheme encoding the decision tree as a model such that an encoding cost for describing the decision tree and the training set is minimized, and the method taking into account the encoding cost in the pruning.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 54 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (54)   |   Backward references (2)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Buy PDF- 8pp US4719571  1988-01 Rissanen et al.  International Business Machines Corporation Algorithm for constructing tree structured classifiers
Buy PDF- 55pp US5490060  1996-02 Malec et al.  Information Resources, Inc. Passive data collection system for market research data
       
Foreign References: None

Other Abstract Info: DERABS G98-436890 DERG98-436890

Other References:
  • Agrawal et al., "Quest: A project on Database Mining", SIGMOD Conference http://sunsite.ust.hkdblp/db/conf/sigmod/sigmod94-514.html, pp.1-2, 1994.
  • Shani et al., Fundamentals Of Data Structures, chapter 6 section 6.2, pp. 292-301, 1983.
  • R. Agrawal et al., An Interval Classifier for Database Mining Applications, Proceedings of the 18th VLDB Conference Vancouver, British Columbia, Aug. 1992.
  • R. Agrawal et al., Database Mining: A Performance Perspective, IEEE Transactions on Knowledge and Data Engineering, vol. 5, No. 6, pp. 914-925, Special Issue on Learning and Discovery in Knowledge-Based Databases, Dec. 1993. (12 pages) Cited by 35 patents [ISI abstract]
  • L. Breiman (Univ. of CA-Berkeley) et al. Classification and Regression Trees (Book) Chapter 2. Introduction to Tree Classification pp. 18-58, Wadsworth International Group, Belmont, CA 1984.
  • J. Catlett, Megainduction: Machine Learning on Very Large Databases, PhD thesis, Univ. of Syndey, Jun./Dec. 1991.
  • P. K. Chan et al., Experiments on Multistrategy Learning by Meta-learning. In Proc. Second Intl. Conf. on Info. and Knowledge Mgmt., pp. 314-323, 1993.
  • S. J. Hong, R-MINI: A heuristic Algorithm for Generating Minimal Rules from Examples. In 3rd Pacific Rim Int'l Conference on Artificial Intelligence, pp. 331-337, Aug. 1994.
  • U. Fayyad et al., The Attribute, Selection Problem in Decision Tree Generation. In 105h Nat'l Conf. on AI AAAI-92, Learning: Inductive 1992.
  • M. James, Classification Algorithms (book), Chapters 1-3, QA278.65, J281 Wiley-Interscience Pub., 1985.
  • M. Mehta et al., Mdl-based Decision Tree Pruning. Int'l Conference on Knowledge Discovery in Databases and Data Mining (KDD-95) Montreal, Canada, pp. 216-221, Aug. 1995.
  • J. R. Quinlan et al., Inferring Decision Trees Using Minimum Description Length Principle, Information and Computation 80, pp. 227-248, 1989. (0890-5401/89 Academic Press, Inc.). (22 pages) Cited by 15 patents
  • Wallace et al., Coding Decision Trees, Machine Learning, 11, pp. 7-22, 1993. (Kluwer Academic Pub., Boston. Mfg. in the Netherlands.). (16 pages) Cited by 4 patents [ISI abstract]
  • S. M. Weiss et al., Computer Systems that Learn, Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, pp. 113-143, 1991. Q325.5, W432, C2, Morgan Kaufman Pub. Inc., San Mateo, CA.
  • S. Murthy et al., Decision Tree Induction: How Effective is the Greed Heuristic? Int'l Conference on Knowledge Discovery in Databases and Data Mining (KDD 95) Montreal, Canada, pp. 222-227.
  • No. 08/436,794, filed Apr. 14, 1994, for System and Method for Mining Generalized Association Rules in Databases U.S. Pat. No. 5,615,341 (presently unavailable to view).


  • Inquire Regarding Licensing

    Powered by Verity


    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2010 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help