hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nadeem Moidu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3086) Skewed Join Optimization
Date Tue, 18 Sep 2012 16:52:08 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457947#comment-13457947
] 

Nadeem Moidu commented on HIVE-3086:
------------------------------------

Yes, in the current implementation, both the tables will be scanned twice. This can be avoided
if the table scan operator is not replicated and has multiple children instead, but this optimization
has not been done in this patch.
                
> Skewed Join Optimization
> ------------------------
>
>                 Key: HIVE-3086
>                 URL: https://issues.apache.org/jira/browse/HIVE-3086
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Nadeem Moidu
>            Assignee: Namit Jain
>             Fix For: 0.10.0
>
>         Attachments: hive.3086.1.patch, hive.3086.2.patch, hive.3086.3.patch, hive.3086.4.patch,
hive.3086.5.patch, hive.3086.6.patch
>
>
> During a join operation, if one of the columns has a skewed key, it can cause that particular
reducer to become the bottleneck. The following feature will address it:
> https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message