tajo-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jihoon Son (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TAJO-1632) Enable broadcast join planning for outer joins
Date Wed, 02 Sep 2015 01:51:46 GMT

     [ https://issues.apache.org/jira/browse/TAJO-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jihoon Son updated TAJO-1632:
-----------------------------
    Fix Version/s:     (was: 0.11.0)
                   0.12.0

> Enable broadcast join planning for outer joins
> ----------------------------------------------
>
>                 Key: TAJO-1632
>                 URL: https://issues.apache.org/jira/browse/TAJO-1632
>             Project: Tajo
>          Issue Type: Improvement
>          Components: distributed query plan
>            Reporter: Jihoon Son
>             Fix For: 0.12.0
>
>
> TAJO-1553 is recently resolved to improve broadcast join planning, but it has a limitation
for outer joins. That is, _for outer joins, preserved-row relations are not broadcastable
to avoid input data duplication._ This rule might limit broadcast join opportunity. Let me
consider the following query as an example.
> {noformat}
> select * from a left outer join b left outer join c
> (a, b, and c are sufficiently small to be broadcasted.)
> {noformat}
> Please note that two consecutive left outer joins are associative. That is, their execution
order can be changed without making result invalid. Thus, candidate query plans are as follows.
(LOJ is short for left outer join)
> 1)
> {noformat}
>       LOJ
>      /   \
>   LOJ     c
>  /   \
> a     b
> {noformat}
> 2)
> {noformat}
>   LOJ
>  /   \
> a     LOJ
>      /   \
>     b     c
> {noformat}
> In the query plan 1), only *a* is preserved-row. Thus, if the query plan 1) is selected,
our current broadcast join planner makes the entire query plan as a single execution block
with broadcast relations of *b* and *c*. 
> In contrast, if the query plan 2) is selected, it is executed with two execution blocks
each of which performs a left outer join because only *c* is not preserved-row and thus broadcastable.
> This limitation according to the forms of selected query plan will degrade performance
of outer join processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message