hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-964) handle skewed keys for a join in a separate job
Date Mon, 11 Jan 2010 17:44:54 GMT

    [ https://issues.apache.org/jira/browse/HIVE-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798742#action_12798742
] 

He Yongqiang commented on HIVE-964:
-----------------------------------

{quote}
if (alias == numAliases - 1 && !(this.handleSkewJoin &&
this.skewJoinKeyContext.skewKeyInCurrentGroup)) {
JoinOperator.java:

Do you need the change ? Why do we need to handle skew for the last key ?
{quote}

this change is not needed right now. It is a mistake by last patch. In last patch, joinOp
directly write skew keys into hdfs and there is no copy in storage once data is written into
hdfs. since right we first use local disk to store data and upload to hdfs at last, we can
remove this change.

will work on other comments.  Thanks for the detailed comments!

> handle skewed keys for a join in a separate job
> -----------------------------------------------
>
>                 Key: HIVE-964
>                 URL: https://issues.apache.org/jira/browse/HIVE-964
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>         Attachments: hive-964-2009-12-17.txt, hive-964-2009-12-28-2.patch, hive-964-2009-12-29-4.patch,
hive-964-2010-01-08.patch
>
>
> The skewed keys can be written to a temporary table or file, and a followup conditional
task can be used to perform the join on those keys.
> As a first step, JDBM can be used for those keys

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message