 |
 |
|
|
|
|
Title: |
US5742811:
Method and system for mining generalized sequential patterns in a large database
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Agrawal, Rakesh; San Jose, CA
Srikant, Ramakrishnan; San Jose, CA

|
Assignee: |
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
1998-04-21
/ 1995-10-10

|
Application Number: |
US1995000541665

|
IPC Code: |
Advanced:
G06F 17/30;
Core:
more...
IPC-7:
S06F 17/30;

|
ECLA Code: |
G06F17/30S4P8D; G06F17/30T;

|
U.S. Class: |
Current:
707/006;
707/005;
707/E17.058;
Original:
395/606;
395/606;

|
Field of Search: |
395/605,606

|
Priority Number: |
| 1995-10-10 |
US1995000541665 |

|
Abstract: |
A method and apparatus are disclosed for mining generalized sequential patterns from a large database of data sequences, taking into account user specified constraints on the time-gap between adjacent elements of the patterns, sliding time-window, and taxonomies over data items. The invention first identifies the items with at least a minimum support, i.e., those contained in more than a minimum number of data sequences. The items are used as a seed set to generate candidate sequences. Next, the support of the candidate sequences are counted. The invention then identifies those candidate sequences that are frequent, i.e., those with a support above the minimum support. The frequent candidate sequences are entered into the set of sequential patterns, and are used to generate the next group of candidate sequences. Preferably, the candidate sequences are generated by joining previously found frequent candidate sequences, and candidate sequences having a contiguous subsequence without minimum support are discarded. In addition, the invention includes a hash-tree data structure for storing the candidate sequences and memory management techniques for performance improvement.

|
Attorney, Agent or Firm: |
Tran, Khanh Q. ;
Pintner, James C. ;

|
Primary / Asst. Examiners: |
Amsbury, Wayne;

|
INPADOC Legal Status: |
Show legal status actions

|
Family: |
None

|
First Claim:
Show all 45 claims |
What is claimed is:
1. A method for identifying sequential patterns in data sequences stored in a database, each data sequence having a plurality of temporally-spaced transactions, each transaction being characterized by one or more items, the items having a taxonomy which defines descendant and ancestor relationships between the items, the method comprising the steps of:
- determining the items having at least a minimum support, the support of an item being a number of data sequences that contain the item, the determined items initially making up a seed set;
- generating candidate sequences, if any, from the seed set;
- counting the support for the candidate sequences;
- identifying those candidate sequences which are frequent, if any, a candidate sequence being frequent if it has at least the minimum support;
- adding the frequent candidate sequences to an output set of sequential patterns; and
- repeating the method steps starting with the generating step, using the just identified frequent candidate sequences as the seed set, until no candidate sequences are generated or no frequent candidate sequences are identified.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 43 U.S. patent(s) that reference this one

|
 |
 |
|
|
|
|
Foreign References: |
None

|
Other Abstract Info: |
DERABS G98-260880
DERG98-260880

|
Other References: |
Agrawal et al., An Interval Classifier for Database Mining Applications, Proc of the 18th VLDB Conference, Vancouver, British Columbia, Aug. 31, 1992, pp. 560-573.
Dietterich et al., Discovering Patterns in Sequences of Events, Artificial Intelligence, Elsevier Science Publishers B.V. (North Holland), pp. 187-232, 1985.
(46 pages)
R. Agrawal et al., Mining Sequential Patterns, Int'l Conference on Data Engineering, pp. 3-14, Mar. 1995.
H. Mannila et al., Discovering Frequent Episodes in Sequences (Extended Abstract) Int'l Conference on Knowledge Discovery in Databases and Data Mining, Dec. 1993, KDD-95, pp. 210-215.
R. Agrawal et al., Fast Algorithms for Mining Association Rules, Proceedings of the 20th VLDB Conf., Santiago, Chile, pp. 487-499, 1994.
J Wang et al., Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results, Proc. of the ACM SIGMOD Conf. on Management of Data, Minneapolis, pp. 115-125, May 1994.
R. Agrawal et al., Mining Association Rules Between Sets of Items in Large Databases, Proc. of the ACM SIGMOD Conf. on Management of Data, pp. 207-216, Washington, D.C. May 1993.
R. Agrawal et al., Database Mining: A Performance Perspective, IEEE Transactions on Knowledge and Data Eng. Special Issue on Learning and Discovery in Knowledge-Based Databases, pp. 914-925, Dec. 1993.
(12 pages)
Cited by 35 patents
[ISI abstract]
R. Srikant et al., Mining Generalized Association Rules, IBM Research Report 9963 (87922), Jun. 27, 1995.
M. Houtsma et al., Set-Oriented Mining for Association Rules, Research Report 9567 (83573) Oct. 22, 1993.

|


|
Nominate this for the Gallery...

|
|