db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <derby-...@db.apache.org>
Subject [jira] Updated: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
Date Sat, 01 Jul 2006 13:56:31 GMT
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B updated DERBY-781:
----------------------

    Attachment: d781_v1.patch
                d781_v1.stat
                DERBY-781_v1.html

Attaching a patch (d781_v1.patch) to address this issue by allowing the optimizer to consider
and choose hash joins with subqueries, which is a more general case of the specific union
example mentioned in the description for this issue.  In brief, the patch does this by following
up on the suggestions given by Jeff Lichtman in comments above and also in the following thread:

http://article.gmane.org/gmane.comp.apache.db.derby.devel/12208

Since result set materialization comes for "free" with hash joins, that fact we now allow
hash joins with subqueries (as of this patch) means that we implicitly have a way to materialize
the subquery result sets.

The details of the patch are included as DERBY-781_v1.html.  I added a simple test to lang/subquery.sql
to demonstrate that the optimizer can and will choose to do hash joins for subqueries, and
I updated one other master file--predicatesIntoViews--for which the optimizer is now choosing
a hash join instead of a nested loop.  Testing of "unsafe" hash joins (see section VII of
the document) and generation of correct plans is done through existing tests, esp. the lang/lojreorder.sql
test, which was very useful in helping to verify the correctness of the changes.

Note that I did not add the sample union query shown in the description for this issue to
the tests because when I run it against the current codeline, the optimizer will already choose
to do materialization of the UnionNode (via hash join) even without the patch for this issue,
and thus it didn't seem like that particular test case was useful.  The new test in subqery.sql
is more relevant because the optimizer will choose to do a nested loop join with the subquery
before my changes and will do a hash join after my changes, which seems to more accurately
reflect what this issue is about.

I ran derbyall using sane jars on Red Hat Linux with ibm142 and saw no new failures, and the
overall execution time does not change despite the extra work the optimizer is doing.

I would greatly appreciate any review/feedback people might have on these changes.  Thanks.

> Materialize union subqueries in select list where possible to avoid creating invariant
resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-781
>          URL: http://issues.apache.org/jira/browse/DERBY-781
>      Project: Derby
>         Type: Improvement

>   Components: SQL
>     Versions: 10.1.1.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram
>     Assignee: A B
>  Attachments: DERBY-781_v1.html, d781_v1.patch, d781_v1.stat
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant
resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is
possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner
table) This can be very costly if the underlying selects can take long time and also may perform
union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message