hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Na Yang" <>
Subject Review Request 27719: numRows and rawDataSize are not collected by the Spark stats [Spark Branch]
Date Fri, 07 Nov 2014 02:35:33 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.

Bugs: Hive-8756

Repository: hive-git


numRows and rawDataSize are not collected by the Spark stats. That is caused by the FileSinkOperator
in the ReduceWork is not set the stats config. In the GenSparkUtils.removeUnionOperators,
the operator tree gets cloned and new FileSinkOperator is generated and set to the reduce
work. However, during processFileSink, the original FileSinkOperator is set the collectStats
tag in GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in the ReduceWork.


  itests/src/test/resources/ 79a0132 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/ 8290568 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/ e8e18a7 
  ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION 




Na Yang

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message