hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key
Date Mon, 08 Aug 2011 06:54:29 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080779#comment-13080779
] 

jiraposter@reviews.apache.org commented on HIVE-2329:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1313/
-----------------------------------------------------------

Review request for hive.


Summary
-------

If map aggregation is set to false, DISTRIBUTED BY followed by GROUP BY with same key fails
in runtime. ReduceSinkDeDuplication optimization should be avoid if child of child RS is GBY.



This addresses bug HIVE-2329.
    https://issues.apache.org/jira/browse/HIVE-2329


Diffs
-----

  ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q PRE-CREATION 
  ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java e91b4d5 

Diff: https://reviews.apache.org/r/1313/diff


Testing
-------


Thanks,

Navis



> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.1.patch, HIVE-2329.2.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1
cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing
rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message