hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2262) mapjoin followed by union all, groupby does not work
Date Tue, 01 Nov 2011 01:11:32 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140759#comment-13140759
] 

Ashutosh Chauhan commented on HIVE-2262:
----------------------------------------

That looks like a possible solution to me. I would like to see what others think as well.
Yu, can you generate a svn friendly patch. That will be easier to review.
                
> mapjoin followed by union all, groupby does not work
> ----------------------------------------------------
>
>                 Key: HIVE-2262
>                 URL: https://issues.apache.org/jira/browse/HIVE-2262
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: yu xiang
>            Priority: Trivial
>
> sql:
> CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, double_data
DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 as c1, 0
as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1) union all select /*+mapjoin(a)*/
int_data2, 1 as c1, 2 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1))
mapjointable group by int_data2;
> exception:
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514)
>         at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76)
>         at org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64)
>         at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> Analyse the reason:
> 1.When use mapjoin,union,groupby together,the UnionProcFactory.MapJoinUnion()(optimizer)
will set the MapJoinSubq true, and set up the UnionParseContext.
> 2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan.
> 3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call GenMRRedSink1()).process()
to init the plan.But the utask's plan has been set yet, it just need to set reducer.And also
the utask is processing temporary table, there is no topOp map to table.So here we get null
exception.
> Solutions:
> 1.SQL solution:use a sub query to modify the sql;
> 2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, set a settaskplan
flag true to indicate the plan for this utask has been set.When in GenMRRedSink3 ,if this
flag sets true, don't use the GenMRRedSink1()).process() to reinit the plan.
> ++++++++++++++++++++++++++++
> if (uCtx.isMapOnlySubq()&&!upc.isIssetTaskPlan())
> ++++++++++++++++++++++++++++
> I don't know whether the code solution is suitable.
> Is there any better solution?
> thx

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message