hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szehon Ho (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HIVE-8702) Extra MapTask created but not connected [Spark Branch]
Date Tue, 04 Nov 2014 00:22:33 GMT

     [ https://issues.apache.org/jira/browse/HIVE-8702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Szehon Ho resolved HIVE-8702.
-----------------------------
    Resolution: Invalid

Took another look.  So Suhas had wired up two resolvers that need to be enabled.  I had enabled
only the first one (SparkMapJoinOptimizer).  There is a second one called SparkReduceSinkMapJoinProc
that also needs to be wired.  Once its wired, the plan looks more appropriate.

> Extra MapTask created but not connected [Spark Branch]
> ------------------------------------------------------
>
>                 Key: HIVE-8702
>                 URL: https://issues.apache.org/jira/browse/HIVE-8702
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Szehon Ho
>
> Based on Szehon's observation, there is a strange extra maptask generated but not connected.
 Here is the query to demonstrate:
> {code}
> select * FROM
> (SELECT avg(key) as x1, value as x2 FROM src group by value) x
> JOIN
> (SELECT avg(key) as y1, value as y2 FROM src group by value) y ON (x1 = y1)
> JOIN
> (SELECT avg(key) as z1, value as z2 FROM src group by value) z ON (x1 = z1);
> {code}
> We shouldn't generate it in the first place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message