db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <derby-...@db.apache.org>
Subject [jira] Updated: (DERBY-805) Push join predicates into union and other set operations. DERBY-649 implemented scalar (single table) predicate pushdown. Adding join predicate push down could improve performance significantly.
Date Wed, 01 Mar 2006 22:12:11 GMT
     [ http://issues.apache.org/jira/browse/DERBY-805?page=all ]

A B updated DERBY-805:
----------------------

    Attachment: d805_phase3_v1.patch
                d805_phase3_v1.stat

Attaching a "Phase 3" patch, d805_phase3_v1.patch, to implement the changes described in steps
2, 3, 4 and 6 of DERBY-805_v3.html.  In short, this patch does the following:

1. Implements the changes as described in the HTML document attached to this issue (steps
2, 3, 4 and 6).

2. Adds a check in PredicateList.java to skip scoped predicates when trying to find a predicate
to use for a hash join.  This check is needed because if the predicate was scoped and pushed
down from an outer query, then one of the operands points to the outer query and the other
(scoped) operand points down into a subquery, which means we can't do the hash join.  Without
this check the optimizer might choose to do a hash scan using the predicate even though it
won't actually be able to do the corresponding hash join.  That means we would lose the opportunity
to use the predicate for an index scan, and thus we could miss out on a much better (index-based)
plan.

3. Makes a slight change to OptimizerImpl.addOrLoadBestPlanMappings to avoid doing extra work
where possible.  More specifically, skips the logic to save/restore bestJoinOrder if there
is only one Optimizable in the list, as there's only one possible join order in that case
and so we don't need to keep track of which join order is best for each of the outer queries.

4. Adds a new test, lang/predicatePushdown.sql, to the harness as part of the derbylang suite.
 This test runs a bunch of queries that relate to the pushing of predicates down into unions.
 Most of these test cases failed (either with compilation/execution errors or incorrect results)
at one point or another while I worked on this issue, so I've chosen to include them all as
part of the new test.  The only way (that I know of) to tell if predicate pushdown is actually
occuring is to print out the query plans, so the master file for this test is very large--almost
10,000 lines.  For that reason, combined with the fact that I think additional predicate pushdown
tests will be needed as support for pushdown grows, I decided to create a new test instead
of adding the test cases to an existing test.

Once the Phase 1, Phase 2, and Phase 3 patches have been applied, along with the patch for
DERBY-1007, the basic predicate pushdown functionality will be in place.  That said, there
are still two outstanding issues into which I'm looking:

A - As described in DERBY-805_v3.html, Phases 1 thru 3 will make it so that join predicates
targeted for UNION nodes will always get pushed if it's possible to do so, which in turn means
that the optimizer will no longer consider the cost of doing a hash join with the UNION (the
hash join would require that the predicate _not_ be pushed into the UNION).  So I still need
to investigate this to see how I can make the optimizer cost pushing predicates and using
a nested loop join verses not pushing them and using a hash join.

B - The changes I've made for this issue--esp. for Phase 1--add a good amount of code to the
optimization code path.  This means that it's now possible for the optimizer to timeout "sooner"
than it would have prior to these changse.  By "sooner" I mean that the optimizer might not
have a chance to look at as many join orders as it used to (because it takes longer for it
to cost each join order, esp. with deeply nested subqueries).  This means that there are situations
where the optimizer pre-DERBY-805 may have found one plan but post-DERBY-805 may never get
to try that plan, and so it could end up choosing a worse plan than it used to. So I need
to investigate the optimizer "timeout" mechanism to see how I can address this issue.

As originally stated when I broke this enhancement down into phases, the issues that remain
after Phase 3 will be addressed in Phase 4 (and whatever follow-up phases are required thereafter).
In the meantime, the surest way to get the full effect of predicate pushdown is to disable
optimizer timeout, which can be done with the property derby.optimizer.noTimeout=true.  Note
that I set this property for the new predicatePushdown test, to ensure that the optimizer
will choose the same plans across all machines.

I ran derbyall on Red Hat Linux with IBM 1.4.2 and saw no new failures.  ** NOTE ** though
that this patch, d805_phase3_v1.patch, has a dependency on d805_phase2_v1.patch and also on
the patch for DERBY-1007, attached to that issue and called d1007_v1.patch.  So both of those
patches must be applied before d805_phase3_v1.patch can be applied.

I know the patch looks long, but keep in mind that a good 9,000 lines of the diff is just
for the new test.  I thought about posting the new test separately before the Phase 3 patch
and then submitting an updated master file for Phase 3, but then the diff there would be even
larger as most of the query plans would change.  So I've just included the new test as part
of the Phase 3 patch.

So in short, the current patches for review-and-commit are as follow:

1. d805_phase1_v3.patch (already committed)
2. d805_phase2_v1.patch
3. d1007_v1.patch (for DERBY-1007; dependent on #1)
4. d805_phase3_v1.patch (dependent on all of the previous patches).

If anyone has the time to review/comment on any of these patches, I would most certainly appreciate
it...

> Push join predicates into union and other set operations. DERBY-649 implemented scalar
(single table) predicate pushdown. Adding join predicate push down could improve performance
significantly.
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-805
>          URL: http://issues.apache.org/jira/browse/DERBY-805
>      Project: Derby
>         Type: Sub-task
>   Components: SQL
>     Versions: 10.2.0.0, 10.1.2.0
>  Environment: generic
>     Reporter: Satheesh Bandaram
>     Assignee: A B
>      Fix For: 10.2.0.0
>  Attachments: DERBY-805.html, DERBY-805_v2.html, DERBY-805_v3.html, d805_phase1_v1.patch,
d805_phase1_v1.stat, d805_phase1_v2.patch, d805_phase1_v2.stat, d805_phase1_v3.patch, d805_phase1_v3.stat,
d805_phase2_v1.patch, d805_phase2_v1.stat, d805_phase3_v1.patch, d805_phase3_v1.stat
>
> Fix for DERBY-649 implemented scalar (single table) predicate push down into UNIONs.
While this improves performance for one set of queries, ability to push join-predicates further
improves Derby performance by enabling use of indices where possible.
> For example,
> create view V1 as select i, j from T1 union all select i,j from T2; 
> create view V2 as select a,b from T3 union all select a,b from T4; 
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5); 
> For a query like
> select * from V1, V2 where V1.j = V2.b and V1.i =1;
> If the join order choosen is V1,V2, V1 can use index on V1.i (if present) following fix
for DERBY-649. But if there is a index on V2.b also, Derby currently can't use that index.
By pushing join predicate, Derby would be able to use the index and improve performance. Some
of the queries I have seen (not the one shown here...) could improve from 70-120 seconds to
about one second.
> Note there is a good comment by Jeff Lichtman about join-predicate push down. I am copying
parts of it here for completeness of this report: (Modified)
> If predicate push down is done during optimization, it would be possible to push joins
into the union as long as it's in the right place in the join order.
> For example:
> create view v as select * from t1 union all select * from t2;
> select * from v, t3 where v.c1 = t3.c2;
> In this select, if t3 is the outer table then the qualification could be pushed into
the union and optimized there, but if t3 is the inner table the qualification can't be pushed
into the union.
> If the pushing is done at preprocess time (i.e. before optimization) it is impossible
to know whether a join qualification like this can be safely pushed.
> There's a comment in UnionNode.optimizeIt() saying:
> /* RESOLVE - don't try to push predicated through for now */
> This is where I'd expect to see something for pushing predicates into the union during
optimization.
> BTW, the business of pushing and pulling predicates during optimization can be hard to
understand and debug, so maybe it's best to only handle the simple cases and do it during
preprocessing.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message