 |
 |
|
|
|
|
Title: |
US6308172:
Method and apparatus for partitioning a database upon a timestamp, support values for phrases and generating a history of frequently occurring phrases
[ Derwent Title ]
>> View Certificate of Correction for this publication

|
Country: |
US United States of America

|
| |
Inventor: |
Agrawal, Rakesh; San Jose, CA
Srikant, Ramakrishnan; San Jose, CA
Lent, Brian Scott; Union City, CA

|
Assignee: |
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
2001-10-23
/ 1999-07-06

|
Application Number: |
US1999000348595

|
IPC Code: |
Advanced:
C06F 3/04;
Core:
C06F 3/00;
IPC-7:
G06F 17/30;

|
ECLA Code: |
C06F3/04;

|
U.S. Class: |
Current:
707/005;
704/004;
704/008;
704/009;
707/002;
707/006;
707/100;
707/102;
707/203;
715/236;
Original:
707/005;
707/002;
707/006;
707/100;
707/102;
707/203;
707/511;
707/536;
704/004;
704/008;
704/009;

|
Field of Search: |
707/001-10,100-104,200-206,500,511,536
704/001-10,205,221-223,240-245,251-252,257,267-268,276
706/045-52
711/118-123
712/002,12-13,240
714/020

|
Priority Number: |

|
Abstract: |
A method and apparatus for mining text databases, employing sequential pattern phrase identification and shape queries, to discover trends. The method passes over a desired database using a dynamically generated shape query. Documents within the database are selected based on specific classifications and user defined partitions. Once a partition is specified, transaction IDs are assigned to the words in the text documents depending on their placement within each document. The transaction IDs encode both the position of each word within the document as well as representing sentence, paragraph, and section breaks, and are represented in one embodiment as long integers with the sentence boundaries. A maximum and minimum gap between words in the phrases and the minimum support all phrases must meet for the selected time period may be specified. A generalized sequential pattern method is used to generate those phrases in each partition that meet the minimum support threshold. The shape query engine takes the set of phrases for the partition of interest and selects those that match a given shape query. A query may take the form of requesting a trend such as "recent upwards trend", "recent spikes in usage", "downward trends", and "resurgence of usage". Once the phrases matching the shape query are found, they are presented to the user.

|
Attorney, Agent or Firm: |
Gray Cary Ware & Freidenrich ;

|
Primary / Asst. Examiners: |
Choules, Jack; Channavajjala, Srirama

|
Maintenance Status: |
CC Certificate of Correction issued View Certificate of Correction

|
INPADOC Legal Status: |
Show legal status actions
Family Legal Status Report

|
 |
 |
|
|
|
|
Parent Case: |
This is a continuation of U.S. patent application Ser. No. 08/909,901, filed Aug. 12, 1997 which issued as U.S. Pat. No. 6,006,223 on Dec. 21, 1999.

|
Family: |
Show 2 known family members

|
First Claim:
Show all 9 claims |
What is claimed is:
1. A computer executed method for discovering trends in a database, comprising:
- mapping words in a plurality of words to a data-sequence of data contained in a data field and identifiable by a position identifier, the data-sequence having transactions where a transaction includes a set of items, a word being mapped to a single-item transaction in a data-sequence; and
- mapping phrases to a sequential-pattern of data contained in a data field and identifiable by a position identifier, the sequential-pattern of data having sets of items, a phrase being mapped to a sequential-pattern having one item in each set of items;
- partitioning a database into data fields based upon a timestamp, the timestamp specifying a data field location within the database;
- determining support values for phrases;
- identifying frequent phrases in a partition, a phrase being frequent if the presence of the phrase in data fields included in the partition exceeds a support value for the phrase;
- generating a history of the frequency of occurrence of each phrase; and
- finding phrases in the history that satisfy a trend.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 28 U.S. patent(s) that reference this one

|
|