Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced   Derwent    Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 20pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
  Go to:  Derwent  
 Email this to a friend  Email this to a friend 
       
Title: US6343288: Single pass space efficient system and method for generating an approximate quantile in a data set having an unknown size
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
20 pages

 
Inventor: Lindsay, Bruce Gilbert; San Jose, CA
Manku, Gurmeet Singh; Santa Clara, CA
Rajagopalan, Sridhar; San Jose, CA

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2002-01-29 / 1999-03-12

Application Number: US1999000268089

IPC Code: Advanced: G06F 17/30;
Core: more...
IPC-7: G06F 17/30;

ECLA Code: G06F17/30S4P8D;

U.S. Class: Current: 707/007; 707/002; 707/006; 707/101;
Original: 707/007; 707/002; 707/006; 707/101;

Field of Search: 707/002,6,7,101

Priority Number:
1999-03-12  US1999000268089

Abstract:     A space-efficient system and method for generating an approximate phi-quantile data element of a data set in a single pass over the data set, without a priori knowledge of the size of the data set. The approximate phi-quantile is guaranteed to lie within a user-specified approximation error epsi of the true quantile being sought with a probability of at least 1-delta, with delta being a user-defined probability of failure. B buffers, each having a capacity of k elements, initially are filled with elements from the data set, with the values of b and k depending on approximation error e and the probability delta. The buffers are then collapsed into an output buffer, with the remaining buffers then being refilled with elements, collapsed (along with the previous output buffer), and so on until the entire data set has been processed and a single output remains. The element of the output corresponding to the approximate quantile is then output as the approximate quantile. In later iterations (when the height of the tree is at least equal to a predetermined height that depends on delta and epsi), the data is sampled non-uniformly to populate the buffers to render the desired performance. Parallel processors can be used, with the final output buffers of the processors being sent to a collecting processor P0 as input buffers to the collecting processor P0.

Attorney, Agent or Firm: Rogitz, John L. ;

Primary / Asst. Examiners: Mizrahi, Diane D.;

INPADOC Legal Status: Show legal status actions

Parent Case:

RELATED APPLICATIONS
    This application is related to U.S. patent application Ser. No. 09/050,434 filed Mar. 30, 1998 now U.S. Pat. No. 6,108,658 for an invention entitled "SINGLE PASS SPACE EFFICIENT SYSTEM AND METHOD FOR GENERATING AN APPROXIMATE QUANTILE IN A DATA STREAM THAT SATISFIES AN APRIORI USER-DEFINED APPROXIMATION ERROR", owned by the present assignee and incorporated herein by reference.

Family: None

First Claim:
Show all 48 claims
We claim:     1. A computer system, comprising:
  • a general purpose computer;
  • one or more input devices associated with the computer for generating a user specification, the specification establishing at least one desired approximate φ-quantile, a quantile approximation error ε, and a probability of failure δ, wherein the approximate φ-quantile is guaranteed to represent a true quantile of a data set and lie within the quantile approximation error ε with a probability of at least 1-δ;
  • a data set having a size, the size being unavailable to the computer in advance; and
  • computer usable code means executable by the computer for determining a φ-quantile data element in the data set, the computer usable code means having:
    • means for determining a number b of buffers and a size k of each buffer, and a number h, based at least in part on the permissible approximation error ε and the probability of failure δ;
    • means for filling empty buffers with data elements to establish a plurality of input buffers;
    • means for collapsing data elements in input buffers into at least one output buffer; and
    • means for outputting, from an output buffer, at least one φ-quantile data element.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 3 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (3)   |   Backward references (13)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Buy PDF- 12pp US4379948  1983-04 Ney et al.  U.S. Philips Corporation Method of and arrangement for deriving characteristic values from a sound signal
Buy PDF- 8pp US4530076  1985-07 Dwyer  The United States of America as represented by the Secretary of the Navy Frequency domain non-linear signal processing apparatus and method for discrimination against non-Gaussian interference
Buy PDF- 12pp US4817158  1989-03 Picheny  International Business Machines Corporation Normalization of speech signals
Buy PDF- 27pp US4829427  1989-05 Green  Data General Corporation Database query code generation and optimization based on the cost of alternate access methods
Buy PDF- 22pp US5018088  1991-05 Higble  The Johns Hopkins University Adaptive locally-optimum detection signal processor and processing methods
Buy PDF- 29pp US5091967  1992-02 Ohsawa  Dainippon Screen Mfg. Co., Ltd. Method of extracting contour of a subject image from an original
Buy PDF- 12pp US5105469  1992-04 MacDonald et al.  Crosfield Electronics Limited Control data array generation apparatus and method
Buy PDF- 16pp US5345585  1994-09 Iyer et al.  International Business Machines Corporation Method for optimizing processing of join queries by determining optimal processing order and assigning optimal join methods to each of the join operations
Buy PDF- 54pp US5379419  1995-01 Hefferman et al.  Digital Equipment Corporation Methods and apparatus for accesssing non-relational data files using relational queries
Buy PDF- 8pp US5664171  1997-09 Agrawal et al.  International Business Machines Corporation System and method for query optimization using quantile values of a large unordered data set
Buy PDF- 8pp US5864841  1999-01 Agrawal et al.  International Business Machines Corporation System and method for query optimization using quantile values of a large unordered data set
Buy PDF- 17pp US6108658  2000-08 Lindsay et al.  International Business Machines Corporation Single pass space efficent system and method for generating approximate quantiles satisfying an apriori user-defined approximation error
Buy PDF- 22pp US6195657  2001-02 Rucker et al.  Imana, Inc. Software, method and apparatus for efficient categorization and recommendation of subjects according to multidimensional semantics
       
Foreign References: None

Other References:
  • Fu, et al. (IEEE publication, 2001) paper entitled "Novel algorithms for computing medians and other quantiles of dis-resident data" in Database Engineering and Application, 2001 International Symposium, pp. 145-154.*
  • Publication: "A One-Pass Space-Efficient Algorithm for Finding Quantiles". Agrawal et al. Proceedings of the 7th International Conference on Management of Data (COMAD-95). India. 1995.
  • Publication: "A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data." Alsabti et al. pp. 346-355. Proceedings of the 23rd VLDB Conference. Athens, Greece, 1997.
  • Publication: "Efficient Sampling Strategies for Relational Database Operations." Lipton et al. pp. 195-226. Theoretical Computer Science. vol. 116. 1993. (32 pages) Cited by 8 patents [ISI abstract]
  • Publication: "A Minimum Storage Algorithm for Computing the Median." Pohl. IBM Research Center, Combinatorial Mathematics. Nov. 1969.
  • Book: "An Introduction to Probability Theory and its Applications". Feller. Princeton Univ. vol. 1, Third Edition. pp. 212-242. 1968.
  • Publication: "The P2 Algorithm for Dynamic Calculation of Quantiles and Histograms Without Storing Observations." Jain et al. Communications of the ACM. vol. 28, No. 10. pp. 1076-1085. Oct. 1985. (10 pages) Cited by 4 patents
  • Publication: "Mining Association Rules Between Sets of Items in Large Databases" Agrawal et al. pp. 207-216. ACM 1993.
  • Publication: "Equidepth Partitioning of a Data Set Based on Finding its Medians." Gurajada et al. pp. 92-101. IEEE. 1991.
  • Publication: "Quantile Estimation From Grouped Data: The Cell Midpoint." B. W. Schmeiser et al. pp. 221-234. Communications in Statistics: Simulation and Computation. pp. 221-234. 1977. (14 pages) Cited by 3 patents
  • Publication: "The Generation of Order Statistics in Digital Computer Simulation: A Survey." B.W. Schmeiser: Proceedings of The Winter Simulation Conference. pp. 137-140. 1978.
  • Publication: "Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries." Muralikrishna et al. pp. 28-36. ACM SIGMOD. 1988.
  • Publication: "Accurate Estimation of the Number of Tuples Satisfying a Condition." Piatetsky-Shapiro et al. pp. 256-275. SIGMOD. Proceedings of Annual Meeting. Boston, MA. 1984.
  • Book: "Selection and Sorting with Limited Storage". Munro et al. Theoretical Computer Science, vol. 12. pp. 315-323. 1980. (9 pages) Cited by 3 patents
  • Publication: "A Theory of the Learnable." L.G. Valiant. Communication of the ACM. vol. 27, No. 11. pp. 1134-1142. Nov. 1984. (9 pages) Cited by 3 patents
  • Publication: "Improved Histograms for Selectivity Estimation of Range Predicates." Poosala et al. SIGMOD. pp. 294-305. Montreal, Canada. 1996.
  • Publication: "Selecting the Median." Dor et al. Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete algorithms. Chpt. 4, pp. 28-37. San Francisco, CA. 1995.
  • Publication: "Parallel Sorting on a Shared-Nothing Architecture Using Probabilistic Splitting." DeWitt et al. Proceedings of the 1st International Conference on Parallel and Distributed Information Systems. pp. 280-291. Florida. Dec. 1991.
  • Publication: "Access Path Selection in a Relational Database Management System." Selinger et al. Proceedings of the ACM-SIGMOD International Conference on Management of Data. pp. 23-34. Boston, MA. 1979.
  • Publication: "Progress in Selection." Mike Paterson. Dept. Computer Science. Univ. Warwick. United Kingdom. 1977.


  • Inquire Regarding Licensing

    Powered by Verity


    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2010 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help