hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] [Commented] (HIVE-3286) Explicit skew join on user provided condition
Date Fri, 20 Jul 2012 09:49:34 GMT


Namit Jain commented on HIVE-3286:

Navis, Nadeem is already working on this in a different approach

I am not sure if there is a jira, but I know he is pretty close to getting one out.
> Explicit skew join on user provided condition
> ---------------------------------------------
>                 Key: HIVE-3286
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
> Join operation on table with skewed data takes most of execution time handling the skewed
keys. But mostly we already know about that and even know what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total execution time could
be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 <
100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for a.key+1 < 50,
one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= a.key < 150, and 17
reducers for others (could be extended to assign more than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should be composed
of join keys only.
> Work till done now will be updated shortly after code cleanup.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message