hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Na Yang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-8756) numRows and rawDataSize are not collected by the Spark stats [Spark Branch]
Date Thu, 06 Nov 2014 02:29:33 GMT
Na Yang created HIVE-8756:
-----------------------------

             Summary: numRows and rawDataSize are not collected by the Spark stats [Spark
Branch]
                 Key: HIVE-8756
                 URL: https://issues.apache.org/jira/browse/HIVE-8756
             Project: Hive
          Issue Type: Bug
            Reporter: Na Yang


Run the following hive queries
{noformat}
set datanucleus.cache.collections=false;
set hive.stats.autogather=true;
set hive.merge.mapfiles=false;
set hive.merge.mapredfiles=false;
set hive.map.aggr=true;

create table tmptable(key string, value string);
INSERT OVERWRITE TABLE tmptable
SELECT unionsrc.key, unionsrc.value 
FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
      UNION  ALL  
      SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
DESCRIBE FORMATTED tmptable;
{noformat}

The hive on spark prints the following table parameters:
{noformat}
COLUMN_STATS_ACCURATE	true                
	numFiles            	2                   
	numRows             	0                   
	rawDataSize         	0                   
	totalSize           	225
{noformat}

The hive on mr prints the following table parameters:
{noformat}
able Parameters:	 	 
	COLUMN_STATS_ACCURATE	true                
	numFiles            	2                   
	numRows             	26                  
	rawDataSize         	199                 
	totalSize           	225 
{noformat}

As above we can see the numRows and rawDataSize are not collected by hive on spark stats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message