 |
 |
|
|
|
|
Title: |
US5799301:
Apparatus and method for performing adaptive similarity searching in a sequence database
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Castelli, Vittorio; White Plains, NY
Li, Chung Sheng; Ossining, NY
Yu, Philip Shi-Lung; Chappaqua, NY

|
Assignee: |
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
1998-08-25
/ 1995-08-10

|
Application Number: |
US1995000513583

|
IPC Code: |
Advanced:
G06F 17/30;
Core:
more...
IPC-7:
G06F 17/30;

|
ECLA Code: |
G06F17/30S4P8D;

|
U.S. Class: |
Current:
707/006;
707/003;
707/005;
Original:
707/006;
707/003;
707/005;

|
Field of Search: |
395/606,611,600,25
707/006,3,5
364/724.11,724.011,728.03

|
Priority Number: |
| 1995-08-10 |
US1995000513583 |

|
Abstract: |
An apparatus and method includes segmenting each sequence to be stored in a database into nonoverlapping or minimally overlapping subsequences with equal length. Each subsequence is then normalized using a normalization procedure (such as normalized with respect to the energy or maximum amplitude of each sequence) and transformed into a series of coefficients in a feature space. The search is based on hierarchical correlation in the feature space between the target sequence with the subsequences. The correlation between the target sequence and the stored sequences is performed first at the lowest level in the hierarchy. At any given level, a match is declared when the correlated result is larger than a specific threshold. Sequences that fail to satisfy the matching criterion are discarded. The process is continued at the following level until the highest level is reached. Because of the hierarchical search, linear scan of the entire sequence can be avoided.

|
Attorney, Agent or Firm: |
Ludwin, Richard M. ;

|
Primary / Asst. Examiners: |
Black, Thomas G.; Coby, Frantz

|
Maintenance Status: |
E2 Expired Check current status

|
INPADOC Legal Status: |
Show legal status actions

|
Family: |
None

|
First Claim:
Show all 9 claims |
What is claimed is:
1. A method for detecting a similarity between a target data sequence and one or more data sequences stored in a database comprising the steps of:
- retrieving a subset of the stored sequences based on the target sequence and an indexing technique, wherein each of said stored sequences and said target sequence have a numerical value, and wherein each of said stored sequences and said target sequence are stored in a feature space;
- correlating, based on the numerical values, between said target sequence and said one or more stored sequences at a first level of a predetermined hierarchy in said feature space;
- testing a result of said correlating step against a predetermined threshold value;
- declaring a match between said target sequence and said one or more stored sequences if said result of said correlating step is greater than said predetermined threshold value.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 20 U.S. patent(s) that reference this one

|
 |
 |
|
|
|
|
Foreign References: |
None

|
Other References: |
Agrawal et al, Database mining: A Performance Perspective, IEEE, pp. 914-925, Dec. 1993.
(12 pages)
Cited by 35 patents
[ISI abstract]
Beckmann et al, The R*-tree: An Efficient and Robus Access Method for Points and Rectangles, Praktische Informatick, pp. 322-332, 1990.
Agrawal et al, Efficient Similarity Search in Sequence Datbases, IBM, pp. 1-16, Mar. 1994.
Saridis et al, Analytic Formulation of Intelligent Machines as Neural Networks, IEEE, pp. 22-27, Dec. 1989.

|


|
Nominate this for the Gallery...

|
|