hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laljo John Pullokkaran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5709) Extend Join merging logic to merge 2 Joins when one Join expression list is a subset of the other.
Date Thu, 31 Oct 2013 18:01:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810501#comment-13810501
] 

Laljo John Pullokkaran commented on HIVE-5709:
----------------------------------------------

This must be ideally a cost based decision. Pulling one join key out and applying it as filter
has following consequences:
Pro:
 1. It saves one shuffling cost

Con:
1. Degree of parallelism may be reduced. Since partitioning of mapper's result set is based
on join key. 
    hf(a,b) != hf(a)

2. The intermediate result set may be large when some join keys are pushed above join as filter.

Due to above factors it seems like this should be a cost based decision.

> Extend Join merging logic to merge 2 Joins when one Join expression list is a subset
of the other.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-5709
>                 URL: https://issues.apache.org/jira/browse/HIVE-5709
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Harish Butani
>
> As pointed out by [~ashutoshc] here: https://reviews.apache.org/r/14953/
> For the following query
> {noformat}
> select p1.name, p2.name, p3.name
> from part p1 join p2 on p1.name = p2.name and p1.key = p2.key join 
> part p3 on p1.name = p3.name
> {noformat}
> 2 jobs are generated:
> - p1 join p2 on name, key
> - join p3 on name
> This can be done as:
> - 1 3-way join of p1,p2,p3 on name
> - followed by a Filter on p1.key = p2.key
> This is valid only for inner joins. 
> This can be done by extending the Merge Join logic to check for a subset relation between
2 QBJoinTree expression lists. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message