hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Na Yang" <ny...@maprtech.com>
Subject Review Request 27719: numRows and rawDataSize are not collected by the Spark stats [Spark Branch]
Date Fri, 07 Nov 2014 02:35:33 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27719/
-----------------------------------------------------------

Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: Hive-8756
    https://issues.apache.org/jira/browse/Hive-8756


Repository: hive-git


Description
-------

numRows and rawDataSize are not collected by the Spark stats. That is caused by the FileSinkOperator
in the ReduceWork is not set the stats config. In the GenSparkUtils.removeUnionOperators,
the operator tree gets cloned and new FileSinkOperator is generated and set to the reduce
work. However, during processFileSink, the original FileSinkOperator is set the collectStats
tag in GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in the ReduceWork.
 


Diffs
-----

  itests/src/test/resources/testconfiguration.properties 79a0132 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 8290568 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e8e18a7 
  ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/27719/diff/


Testing
-------


Thanks,

Na Yang


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message