 |
 |
|
|
|
|
Title: |
US5724573:
Method and system for mining quantitative association rules in large relational tables
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Agrawal, Rakesh; San Jose, CA
Srikant, Ramakrishnan; San Jose, CA

|
Assignee: |
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
1998-03-03
/ 1995-12-22

|
Application Number: |
US1995000577945

|
IPC Code: |
Advanced:
G06F 17/30;
Core:
more...
IPC-7:
G06F 17/30;

|
ECLA Code: |
G06F17/30T;

|
U.S. Class: |
Current:
707/006;
707/001;
707/003;
707/E17.058;
Original:
395/606;
395/601;
395/603;

|
Field of Search: |
395/601,603,606

|
Priority Number: |
| 1995-12-22 |
US1995000577945 |

|
Abstract: |
A method and apparatus are disclosed for mining quantitative association rules from a relational table of records. The method comprises the steps of: partitioning the values of selected quantitative attributes into intervals, combining adjacent attribute values and intervals into ranges, generating candidate itemsets, determining frequent itemsets, and outputting an association rule when the support for a frequent itemset bears a predetermined relationship to the support for a subset of the frequent itemset. Preferably, the partitioning step includes determining whether to partition and the number of partitions based on a partial incompleteness measure. The candidate generation includes discarding those itemsets not meeting a user-specified interest level and those having a subset which is not a frequent itemset. The frequent itemsets are determined using super-candidates that include information of the candidate itemsets. Preferably, each super-candidate has a data structure, such as a multi-dimensional tree or array, representing quantitative attributes common to the replaced candidate itemsets.

|
Attorney, Agent or Firm: |
Tran, Khanh Q. ;
Pintner, James C. ;

|
Primary / Asst. Examiners: |
Amsbury, Wayne;

|
Family: |
None

|
First Claim:
Show all 30 claims |
What is claimed is:
1. A method for identifying quantitative association rules from a table of records, each record having a plurality of attributes associated therewith, the attributes including quantitative and categorical attributes, each attribute having a value, the method comprising the steps of:
- partitioning the values of each quantitative attribute from a selected group of quantitative attributes into a respective plurality of intervals;
- determining a support for each value of the categorical attributes and the non-partitioned quantitative attributes, and a support for each interval of the partitioned quantitative attributes, the support for a value being a number of records in the table whose attribute values include the value, the support for an interval being a number of records in the table whose attribute values are part of the interval;
- for each quantitative attribute, combining adjacent values of the attribute if the attribute is not partitioned, or adjacent intervals of the attribute if the attribute is partitioned, into ranges, as long as the support for each range is less than a maximum support;
- identifying items with at least a minimum support, each item representing a quantitative attribute and a range, or a categorical attribute and a value, the items with at least the minimum support making up a seed set;
- generating candidate itemsets from the seed set, each itemset being a set of items and having a support, the support of the itemset being a number of records in the table which support the itemset;
- determining frequent itemsets from the candidate itemsets, the frequent itemsets being those itemsets whose support is more than the minimum support, the determined frequent itemsets becoming the next seed set;
- repeating the steps of generating candidate itemsets and determining frequent itemsets until all the frequent itemsets are found; and
- outputting an association rule when the support of a selected frequent itemset bears a predetermined relationship to the support of a subset of the selected frequent itemset, thereby satisfying a minimum confidence constraint, the association rule being an expression of the form XY where X and Y are itemsets.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 72 U.S. patent(s) that reference this one

|
 |
 |
|
|
|
|
Foreign References: |
None

|
Other Abstract Info: |
DERABS G98-178893
DERG98-178893

|
Other References: |
DeWitt et al, "The Gamma Database Machine Project", IEEE Trans. Knowledge & Data Engineering, Mar., 31, 1990.
Mannila et al, "Improved Methods for Finding Association Rules", Pub. No. c-1993-65, Universith Helsinki, 193, Dec. 31, 1993.
Park et al, "Eficient Data Mining for Association Rules", IBM Research Report, R210156, Aug. 31, 1995.
R. Agrawal, T. Imielinski, A Swami, Mining Association Rules Between Sets of Items in Large Databases, In Proc. of the ACM SIGMOD Conference on Management of Data, pp. 207-216, Washington, D.C. May 1993.
R. Agrawal, R. Srikant, Fast Algorithms for Mining Association Rules, In Proc. of the VLDB Conference, Santiago, Chile, pp. 487-499, Sep. 1944.
N. Beckmann, H. Kriegel, R. Schneider, B. Seeger, The R*-tree: An Efficient and Robust Access Method for Points and Rectangles, In Proc. of ACM SIGMOD, pp. 322-331, Atlantic City, NJ, May 1990.
R. T. NG, J. Han, Efficient and Effective Clustering Methods for Spatial Data Mining, In Proc. of the VLDB Conference, Santiago, Chile, pp. 144-155, Sep. 1994.
J. S. Park, M. Chen, P. S. Yu, An Effective Hash-Based Algorithm for Mining Association Rules, In Proc. of the ACM-SIGMOD Conference on Management of Data, pp. 175-186 San Jose, California May 1995.
G. P. Shapiro, Discovery, Analysis, and Presentation of Strong Rules, Knowledge Discovery in Databases, pp. 229-248, AAAI/MIT Press, Menlo Park, CA, 1991 (GTE Lab. Incorporated).
M. Houtsma, A. Swami, Set-Oriented Mining for Association Rules, IBM Research Report 9567 (83573), Computer Science, Oct. 22, 1993.
R. Srikant, R. Agrawal, Mining Generalized Association Rules, In Proc. of the VLDB Conference, pp. 407-419, Zurich, Switzerland, Sep. 1995.
J. Han, Y. Fu, Discovery of Multiple-Level Association Rules from Large Databases, In Proc. of the VLDB Conference, pp. 420-431, Zurich Switzerland, Sep. 1995.
A. Savasere, E. Omiecinski, S. Navathe, An Efficient Algorithm for Mining Association Rules in Large Databases, Proceedings of the 21st VLDB Conference pp. 432-444, Zurich, Switzerland, Sep. 1995.

|


|
Nominate this for the Gallery...

|
|