hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3286) Explicit skew join on user provided condition
Date Sun, 22 Jul 2012 00:48:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420008#comment-13420008
] 

Namit Jain commented on HIVE-3286:
----------------------------------

@Navis, can you explain the semantics of the above grammar ?
What doe SKEWED BY, DISTRIBUTE BY imply ?

Also, in the base case:

select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 < 100,
a.key < 150);

are you expecting skewed keys for key <= 49.
Is it true that the skewed keys will only be handled by reducers ?
If yes, why would it reduce the execution time ? The main advantage should be that reducer
wont get any other key, so
wont be burdened. Is that the idea ?
                
> Explicit skew join on user provided condition
> ---------------------------------------------
>
>                 Key: HIVE-3286
>                 URL: https://issues.apache.org/jira/browse/HIVE-3286
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>
> Join operation on table with skewed data takes most of execution time handling the skewed
keys. But mostly we already know about that and even know what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total execution time could
be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 <
100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for a.key+1 < 50,
one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= a.key < 150, and 17
reducers for others (could be extended to assign more than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should be composed
of join keys only.
> Work till done now will be updated shortly after code cleanup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message