hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-964) handle skewed keys for a join in a separate job
Date Thu, 14 Jan 2010 19:03:54 GMT

    [ https://issues.apache.org/jira/browse/HIVE-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800305#action_12800305

Namit Jain commented on HIVE-964:

1. ConditionalTask.java: 80            if(DriverContext.isLaunchable(child))
   Shouldnt this be a assert instead ?

2. SkewJoinResolver: shouldnt it check for HIVESKEWJOINKEY and get out if not set.

3. ExplainPlan should also show subtasks of conditional tasks at the top stage

4. Seems like after the skew join conditional task, the dependency between the original join
and the old children will
   still be kept - it can be removed.

5. Last alias/tag for join does not need a conditional task - the last is the last one in
the order.

6. Instead of serializing/deserializing mapredWork, it might be a good idea to add a clone
to mapredWork - it can be done
   in a followup patch also.

7. GenMRSkewJoinProcessor.java:253 wont localPlan always be null

8. Can there be a fetchWork in the conditional task ?

9. processSkewJoin: do you think it would be cleaner if you break it up into multiple functions

> handle skewed keys for a join in a separate job
> -----------------------------------------------
>                 Key: HIVE-964
>                 URL: https://issues.apache.org/jira/browse/HIVE-964
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>         Attachments: hive-964-2009-12-17.txt, hive-964-2009-12-28-2.patch, hive-964-2009-12-29-4.patch,
hive-964-2010-01-08.patch, hive-964-2010-01-13-2.patch
> The skewed keys can be written to a temporary table or file, and a followup conditional
task can be used to perform the join on those keys.
> As a first step, JDBM can be used for those keys

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message