hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang" <xzh...@cloudera.com>
Subject Re: Review Request 27719: numRows and rawDataSize are not collected by the Spark stats [Spark Branch]
Date Fri, 21 Nov 2014 15:13:31 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27719/#review62557
-----------------------------------------------------------

Ship it!


Ship It!

- Xuefu Zhang


On Nov. 7, 2014, 9:16 p.m., Na Yang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27719/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2014, 9:16 p.m.)
> 
> 
> Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: Hive-8756
>     https://issues.apache.org/jira/browse/Hive-8756
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> numRows and rawDataSize are not collected by the Spark stats. That is caused by the FileSinkOperator
in the ReduceWork is not set the stats config. In the GenSparkUtils.removeUnionOperators,
the operator tree gets cloned and new FileSinkOperator is generated and set to the reduce
work. However, during processFileSink, the original FileSinkOperator is set the collectStats
tag in GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in the ReduceWork.
 
> 
> 
> Diffs
> -----
> 
>   itests/src/test/resources/testconfiguration.properties 79a0132 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 8290568

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e8e18a7 
>   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 8d237c5 
>   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 4946815 
>   ql/src/test/results/clientpositive/spark/semijoin.q.out 9b6802d 
>   ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/27719/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Na Yang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message