hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Muthu (JIRA)" <>
Subject [jira] [Commented] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
Date Thu, 27 Feb 2014 02:25:21 GMT


Muthu commented on HIVE-6041:

This patch doesn't seems to work for hive 0.12 for queries with auto MAPJOIN.
set hive.optimize.skewjoin=true; set; SELECT ru.userid, SUM(ru.total_count)
FROM BIGTABLE ru JOIN SMALLTABLE c on c.creative_id = ru.creative_id JOIN placement_dapi p
ON p.placement_id = c.placement_id WHERE ru.realdate = '2014-01-02' AND ru.userid > 0 GROUP
BY ru.userid;

Stage-1 is selected by condition resolver. File does not exist: /tmp/hive-muthu.nivas/tmp/hive-muthu.nivas/hive_2014-02-26_18-17-04_075_3879899075227148508-1/-mr-10002
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
        at java.lang.reflect.Constructor.newInstance(
        at org.apache.hadoop.ipc.RemoteException.instantiateException(
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(
        at org.apache.hadoop.hdfs.DFSClient.getContentSummary(
        at org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(
        at org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask(
        at org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin.getTasks(
        at org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(

> Incorrect task dependency graph for skewed join optimization
> ------------------------------------------------------------
>                 Key: HIVE-6041
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0
>         Environment: Hadoop 1.0.3
>            Reporter: Adrian Popescu
>            Assignee: Navis
>            Priority: Critical
>             Fix For: 0.13.0
>         Attachments: HIVE-6041.1.patch.txt
> The dependency graph among task stages is incorrect for the skewed join optimized plan.
Skewed joins are enabled through "hive.optimize.skewjoin". For the case that skewed keys do
not exist, all the tasks following the common join are filtered out at runtime.
> In particular, the conditional task in the optimized plan maintains no dependency with
the child tasks of the common join task in the original plan. The conditional task is composed
of the map join task which maintains all these dependencies, but for the case the map join
task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence,
all the other task stages of the query (e.g., move stage which writes down the results into
the result table) are skipped.
> The bug resides in "ql/optimizer/physical/", processSkewJoin()
function, immediately after the ConditionalTask is created and its dependencies are set.

This message was sent by Atlassian JIRA

View raw message