db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dag H. Wanvik (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-4405) Transformation to inner join not performed for certain three-way joins
Date Sat, 12 Dec 2009 01:07:18 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789632#action_12789632
] 

Dag H. Wanvik commented on DERBY-4405:
--------------------------------------

Looking at the reordering done by Derby in
HalfOuterJoinNode#LOJ_reorderable, it seems to be buggy; filed
DERBY-4471 for that.

Instead of modifying fillInReferencedTableMap when used from
transformOuterJoins, I see that I possibly could use the method
ResultSetNode#LOJgetReferencedTables instead, which does look inside
join (inner and outer join) nodes in the way we want. I will make a
simplified patch which uses that method.

Btw, the simplication of outer join to inner join is defined by
identity #3 in [1] referred to above. 


> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Dag H. Wanvik
>            Priority: Minor
>         Attachments: derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html)
there are three examples that are supposed to be equivalent. However, the performance differs
significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in
50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination
of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally.
This is fine because all rows that have the right-side columns filled with NULLs will be filtered
out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join
will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic
so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message