Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced   Derwent    Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 25pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
  Go to:  Derwent  
 Email this to a friend  Email this to a friend 
       
Title: US5742811: Method and system for mining generalized sequential patterns in a large database
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
25 pages

 
Inventor: Agrawal, Rakesh; San Jose, CA
Srikant, Ramakrishnan; San Jose, CA

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 1998-04-21 / 1995-10-10

Application Number: US1995000541665

IPC Code: Advanced: G06F 17/30;
Core: more...
IPC-7: S06F 17/30;

ECLA Code: G06F17/30S4P8D; G06F17/30T;

U.S. Class: Current: 707/006; 707/005; 707/E17.058;
Original: 395/606; 395/606;

Field of Search: 395/605,606

Priority Number:
1995-10-10  US1995000541665

Abstract:     A method and apparatus are disclosed for mining generalized sequential patterns from a large database of data sequences, taking into account user specified constraints on the time-gap between adjacent elements of the patterns, sliding time-window, and taxonomies over data items. The invention first identifies the items with at least a minimum support, i.e., those contained in more than a minimum number of data sequences. The items are used as a seed set to generate candidate sequences. Next, the support of the candidate sequences are counted. The invention then identifies those candidate sequences that are frequent, i.e., those with a support above the minimum support. The frequent candidate sequences are entered into the set of sequential patterns, and are used to generate the next group of candidate sequences. Preferably, the candidate sequences are generated by joining previously found frequent candidate sequences, and candidate sequences having a contiguous subsequence without minimum support are discarded. In addition, the invention includes a hash-tree data structure for storing the candidate sequences and memory management techniques for performance improvement.

Attorney, Agent or Firm: Tran, Khanh Q. ; Pintner, James C. ;

Primary / Asst. Examiners: Amsbury, Wayne;

INPADOC Legal Status: Show legal status actions

Family: None

First Claim:
Show all 45 claims
What is claimed is:     1. A method for identifying sequential patterns in data sequences stored in a database, each data sequence having a plurality of temporally-spaced transactions, each transaction being characterized by one or more items, the items having a taxonomy which defines descendant and ancestor relationships between the items, the method comprising the steps of:
  • determining the items having at least a minimum support, the support of an item being a number of data sequences that contain the item, the determined items initially making up a seed set;
  • generating candidate sequences, if any, from the seed set;
  • counting the support for the candidate sequences;
  • identifying those candidate sequences which are frequent, if any, a candidate sequence being frequent if it has at least the minimum support;
  • adding the frequent candidate sequences to an output set of sequential patterns; and
  • repeating the method steps starting with the generating step, using the just identified frequent candidate sequences as the seed set, until no candidate sequences are generated or no frequent candidate sequences are identified.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 43 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (43)   |   Backward references (5)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Buy PDF- 11pp US5423032  1995-06 Byrd et al.  International Business Machines Corporation Method for extracting multi-word technical terms from text
Buy PDF- 9pp US5442781  1995-08 Yamagata  NEC Corporation System for generating a search formula by accessing search terms on the basis of a training set of pertinent and non-pertinent objects
Buy PDF- 17pp US5546578  1996-08 Takada  Nippon Steel Corporation Data base retrieval system utilizing stored vicinity feature values
Buy PDF- 15pp US5598557  1997-01 Doner et al.  Caere Corporation Apparatus and method for retrieving and grouping images representing text files based on the relevance of key words extracted from a selected file to the text files
Buy PDF- 22pp US5615341  1997-03 Agrawal et al.  International Business Machines Corporation System and method for mining generalized association rules in databases
       
Foreign References: None

Other Abstract Info: DERABS G98-260880 DERG98-260880

Other References:
  • Agrawal et al., An Interval Classifier for Database Mining Applications, Proc of the 18th VLDB Conference, Vancouver, British Columbia, Aug. 31, 1992, pp. 560-573.
  • Dietterich et al., Discovering Patterns in Sequences of Events, Artificial Intelligence, Elsevier Science Publishers B.V. (North Holland), pp. 187-232, 1985. (46 pages)
  • R. Agrawal et al., Mining Sequential Patterns, Int'l Conference on Data Engineering, pp. 3-14, Mar. 1995.
  • H. Mannila et al., Discovering Frequent Episodes in Sequences (Extended Abstract) Int'l Conference on Knowledge Discovery in Databases and Data Mining, Dec. 1993, KDD-95, pp. 210-215.
  • R. Agrawal et al., Fast Algorithms for Mining Association Rules, Proceedings of the 20th VLDB Conf., Santiago, Chile, pp. 487-499, 1994.
  • J Wang et al., Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results, Proc. of the ACM SIGMOD Conf. on Management of Data, Minneapolis, pp. 115-125, May 1994.
  • R. Agrawal et al., Mining Association Rules Between Sets of Items in Large Databases, Proc. of the ACM SIGMOD Conf. on Management of Data, pp. 207-216, Washington, D.C. May 1993.
  • R. Agrawal et al., Database Mining: A Performance Perspective, IEEE Transactions on Knowledge and Data Eng. Special Issue on Learning and Discovery in Knowledge-Based Databases, pp. 914-925, Dec. 1993. (12 pages) Cited by 35 patents [ISI abstract]
  • R. Srikant et al., Mining Generalized Association Rules, IBM Research Report 9963 (87922), Jun. 27, 1995.
  • M. Houtsma et al., Set-Oriented Mining for Association Rules, Research Report 9567 (83573) Oct. 22, 1993.


  • Inquire Regarding Licensing

    Powered by Verity


    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2010 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help