hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] [Commented] (HIVE-3086) Skewed Join Optimization
Date Thu, 02 Aug 2012 10:48:03 GMT


Namit Jain commented on HIVE-3086:

@Yongqiang, the current skew join does the optimization after most of the damage has already
been done.
The reducer detects that a particular key is skewed, and then processes that key in a separate
MR job.

However, in this approach, we are planning to know about the skewed keys before hand (stored
in the metastore),
and then use them to do a map-join for the skewed keys and a normal join for the other keys.
This does require
some change from the user (the user needs to store the skewed keys in the metastore). However,
this approach can
be very good for repetitive workloads - similar queries running every day for similar data.
Most probably, the skew
does not change every day. The skew can be calculated periodically.
> Skewed Join Optimization
> ------------------------
>                 Key: HIVE-3086
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Nadeem Moidu
>            Assignee: Nadeem Moidu
> During a join operation, if one of the columns has a skewed key, it can cause that particular
reducer to become the bottleneck. The following feature will address it:

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message