db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <derby-...@db.apache.org>
Subject [jira] Updated: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
Date Tue, 18 Jul 2006 20:22:19 GMT
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B updated DERBY-781:
----------------------

    Derby Info: [Patch Available, Release Note Needed]  (was: [Patch Available])

Possible RELEASE NOTE for this fix is as follows, based on suggestions from Bryan in the above-referenced
thread:

<begin_release_note>

DERBY-781: Materialize subqueries in select list where possible to avoid creating invariant
resultsets many times.

The Derby optimizer has been enhanced so that it now considers the cost of performing a hash
join with subqueries when it is safe to do so.  If the cost of the hash join is better than
a nested loop join, Derby will choose to do the hash join and will thereby materialize the
subquery.

WHAT CHANGED

When optimizing a query that has one or more non-flattenable subqueries in the FROM clause,
Derby will now check to see if it is possible to perform a hash join with that subquery as
the inner table.  Prior to Derby 10.2, the optimizer would never consider a hash join with
a subquery; it only did nested loop joins.

SYMPTOM

Execution performance of queries containing non-flattenable subqueries may change.  The expectation
is that the new (10.2) query plans will show improved performance over the old ones.

Another potential symptom is that the compilation time for such queries may increase.  If
this happens, the increase should only occur at compilation time; execution time should either
improve or, at the very least, remain the same as in earlier versions of Derby.

CAUSE

If the optimizer chooses to do a hash join with a subquery, Derby only has to execute the
subquery a single time per statement, after which Derby can just perform the desired join
against the materialized result set.  Depending on how many rows are in the outer table of
the join, this once-per-statement execution of the subquery can lead to major performance
improvements over the once-per-outer-row execution employed by earlier versions of Derby.

As for the extra compilation time, this is due to the simple fact that the optimizer is now
doing more work--i.e. in addition to considering nested loop joins with subqueries, it is
now _also_ considering hash joins with those subqueries, and that means that it could potentially
take longer for the optimizer to finish its work.  Note again that, if it occurs, the increased
time should only occur at compilation time; execution time should either improve or, at the
very least, remain the same as in earlier versions of Derby. 

SOLUTION 

This was an intentional change to improve the execution plans chosen by the optimizer for
queries having large and/or complex subqueries.  The expectation is that the new behavior--and
the subsequent query plans--will lead to improved performance over the old ones, so no further
solution is required.

WORKAROUND

There is no way to disable/workaround this new behavior since the symptom as described above
is a good one for Derby.

That said, any user who notices a negative performance change after moving to Derby 10.2,
and who believes that the difference in performance is related to this optimizer enhancement,
is encouraged to visit the following "performance diagnosis" page and to follow up with his/her
findings on the Derby mailing lists:

	http://wiki.apache.org/db-derby/PerformanceDiagnosisTips

<end_release_note>

> Materialize union subqueries in select list where possible to avoid creating invariant
resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.0.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant
resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is
possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner
table) This can be very costly if the underlying selects can take long time and also may perform
union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message