 |
 |
|
|
|
|
Title: |
US5797000:
Method of performing a parallel relational database query in a multiprocessor environment
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Bhattacharya, Partha Pratim; Briarcliff Manor, NY
Chung, Jen-Yao; Yorktown Heights, NY
Pirahesh, Mir Hamid; San Jose, CA
Selinger, Patricia G.; San Jose, CA
Viveros, Marisa S.; Hartsdale, NY
Wang, Yun; Saratoga, CA
Zaino, Lawrence Peter; Poughkeepsie, NY

|
Assignee: |
International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
1998-08-18
/ 1996-06-20

|
Application Number: |
US1996000667056

|
IPC Code: |
Advanced:
G06F 17/30;
Core:
more...
IPC-7:
G06F 17/30;

|
ECLA Code: |
G06F17/30S4P3T5; G06F17/30S4P3P;

|
U.S. Class: |
Current:
707/002;
Original:
395/602;

|
Field of Search: |
395/602,670-678

|
Priority Number: |

|
Abstract: |
A method of performing a parallel join operation on a pair of relations R1 and R2 in a system containing P processors organized into Q clusters of P/Q processors each. The system contains disk storage for each cluster, shared by the processors of that cluster, together with a shared intermediate memory (SIM) accessible by all processors. The relations R1 and R2 to be joined are first sorted on the join column. The underlying domain of the join column is then partitioned into P ranges of equal size. Each range is further divided into M subranges of progressively decreasing size to create MP tasks Tm,p, the subranges of a given range being so sized relative to one another that the estimated completion time for task Tm,p is a predetermined fraction that of task Tm-1,p. Tasks Tm,p with larger time estimates are assigned (and the corresponding tuples shipped) to the cluster to which processor p belongs, while tasks with smaller time estimates are assigned to the SIM, which is regarded as a universal cluster (cluster 0). The actual task-to-processor assignments are determined dynamically during the join phase in accordance with the dynamic longest processing time first (DLPT) algorithm. Each processor within a cluster picks its next task at any given decision point to be the one with the largest time estimate which is owned by that cluster or by cluster 0.

|
Attorney, Agent or Firm: |
Kinnaman, Jr., W. A. ;

|
Primary / Asst. Examiners: |
Black, Thomas G.; Von Buhr, Maria N.

|
INPADOC Legal Status: |
None
Family Legal Status Report

|
 |
 |
|
|
|
|
Parent Case: |
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a division of application Ser. No. 08/148,769 filed Nov. 4, 1993, pending.
This application is related to the following commonly owned, concurrently filed applications, the specifications of which are incorporated herein by reference:
T. Borden, I. S. Narang, D. B. Rathi and D. J. Wisneski, "System and Method for Parallel Processing of Complex Read-Only Database Queries", Ser. No. 08/148,091, now U.S. Pat. No. 5,495,606.
J. L. Wolf, P. S. Yu. and J. J. Turek, "Task Scheduler for a Multiprocessor System", Ser. No. 08/148,108, now abandoned in favor of continuation application Ser. No. 08/293,257, filed Aug. 19, 1994, now U.S. Pat. No. 5,437,032.

|
Family: |
Show 2 known family members

|
First Claim:
Show all 18 claims |
What is claimed is:
1. In a database system having a plurality of processing units, a method of performing a parallel query on a plurality of tables of a database, each of said tables comprising a set of one or more tuples, at least some of said tables permitting partitioned access, said query having an overall cost associated therewith, said method comprising the steps of:
- determining the estimated contribution of each of said tables to the overall cost of said query;
- selecting the table with the greatest estimated contribution to the overall cost of said query that also permits partitioned access;
- partitioning the selected table into a plurality of subsets of one or more tuples;
- dividing said query into a plurality of subqueries restricted to respective subsets of the selected table; and
- transmitting said subqueries to respective processing units for processing.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 20 U.S. patent(s) that reference this one

|
 |
 |
|
|
|
|
Foreign References: |

|
Other Abstract Info: |
DERABS G98-347976

|
Other References: |
Blasgen et al., "Storage and Access in Relational Data Bases", IBM Systems Journal, vol. 16, No. 4, 1977, pp. 363-377.
(15 pages)
Cited by 8 patents
Chen et al., "Schema Integration and Query Decomposition in a Distributed Database System Using a Knowledge Based Approach", Information Modelling and Knowledge Bases III: Foundations, Theory and Applications, 1992, pp. 567-585.
Chen et al., "Two-Step Approach to Optimize Parallel Execution of Multi-Join Queries", IBM Technical Disclosure Bulletin, vol. 34, No. 10B, Mar. 1992, pp. 79-81.
Date, An Introduction to Database Systems, vol. 1, 4th Edition, 1986, pp. 132-136, 266-268, 341-343, 348-349.
DeWitt et al., "The Gamma Database Machine Project", IEEE Transactions on Knowledge and Data Engineering, vol. 2, No. 1., Mar. 1990, pp. 44-62.
Dias et al., "Methods for Improving the Efficiency of Parallel Sort Merge Joins in the Presence of Data Skew", IBM Technical Disclosure Bulletin, vol. 33, No. 10A, Mar. 1991, pp. 166-170.
Hummel et al., "Factoring: A Method for Scheduling Parallel Loops", Communications of the ACM, vol. 35, No. 8, Aug. 1992, pp. 90-101.
(12 pages)
Cited by 3 patents
[ISI abstract]
Kruskal et al., "Allocating Independent Subtasks on Parallel Processors", IEEE Transactions on Software Engineering, vol. SE-11, No. 10, Oct. 1985, pp. 1001-1016.
(16 pages)
Cited by 2 patents
Murphy et al., "Effective Resource Utilization for Multiprocessor Join Execution", Proceedings of the 15th International Conference on Very Large Data Bases, 1989, pp. 67-75.
Murphy et al. "Processor Scheduling for Multiprocessor Joins", Fifth International Conference on Data Engineering, 1991, pp. 140-148.
Polychronopoulos et al., "Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers", IEEE Transactions on Computers, vol. C-36, No. 12., Dec. 1987, pp. 1425-1439.
(15 pages)
Cited by 3 patents
Schneider et al., "Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines", Proceedings of the 16th VLDB Conference, 1990, pp. 469-480.
Swami et al., "Online Algorithms for Load Balancing the Join Operation", IBM Technical Disclosure Bulletin, vol. 34, No. 7B, Dec. 1991, pp. 278-280.
Tseng et al., "Parallel Database Processing on the KSR1 Computer", SIGMOD Record, vol. 22, No. 2, Jun. 1993, pp. 353-455.
Tzen et al., "Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers", IEEE Transactions on Parallel and Distributed Systems, vol. 4, No. 1., Jan. 1993, pp. 87-98.
(12 pages)
Cited by 2 patents
[ISI abstract]
Walton et al., "Data Skew and the Scalability of Parallel Joins", Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing, 1991, pp. 44-51.
Wolf et al., "A Parallel Sort Merge Join Algorithm for Managing Data Skew", IEEE Transactions on Parallel and Distributed Systems, vol. 4, No. 1, Jan. 1993, pp. 70-86.
(17 pages)
Cited by 3 patents
[ISI abstract]
Wolf et al., "An Effective Algorithm for Parallelizing Hash Joins in the Presence of Data Skew", Proceedings of the Seventh International Conference on Data Engineering, 1991, pp. 200-209.
Wolf et al., "An Effective Algorithm for Parallelizing Sort Merge Joins in the Presence of Data Skew", Proceedings of the Second International Symposium on Databases in Parallel and Distributed Systems, 1990, pp. 103-115.
Wolf et al., "Using a Surrogate Median to Speed Up the Execution of a Parallel Sort Merge Join Algorithm", IBM Technical Disclosure Bulletin, vol. 33, No. 9, Feb. 1991, pp. 215-217.

|


|
Nominate this for the Gallery...

|
|