hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Na Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8756) numRows and rawDataSize are not collected by the Spark stats [Spark Branch]
Date Mon, 17 Nov 2014 20:08:33 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215094#comment-14215094
] 

Na Yang commented on HIVE-8756:
-------------------------------

Hi [~brocknoland], thank you for checking on it. I responded Xuefu on RB sometime ago. It
turned out it was not published successfully. I just republished it. 

> numRows and rawDataSize are not collected by the Spark stats [Spark Branch]
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-8756
>                 URL: https://issues.apache.org/jira/browse/HIVE-8756
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Na Yang
>            Assignee: Na Yang
>         Attachments: HIVE-8756.1-spark.patch, HIVE-8756.2-spark.patch
>
>
> Run the following hive queries
> {noformat}
> set datanucleus.cache.collections=false;
> set hive.stats.autogather=true;
> set hive.merge.mapfiles=false;
> set hive.merge.mapredfiles=false;
> set hive.map.aggr=true;
> create table tmptable(key string, value string);
> INSERT OVERWRITE TABLE tmptable
> SELECT unionsrc.key, unionsrc.value 
> FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
>       UNION  ALL  
>       SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
> DESCRIBE FORMATTED tmptable;
> {noformat}
> The hive on spark prints the following table parameters:
> {noformat}
> COLUMN_STATS_ACCURATE	true                
> 	numFiles            	2                   
> 	numRows             	0                   
> 	rawDataSize         	0                   
> 	totalSize           	225
> {noformat}
> The hive on mr prints the following table parameters:
> {noformat}
> able Parameters:	 	 
> 	COLUMN_STATS_ACCURATE	true                
> 	numFiles            	2                   
> 	numRows             	26                  
> 	rawDataSize         	199                 
> 	totalSize           	225 
> {noformat}
> As above we can see the numRows and rawDataSize are not collected by hive on spark stats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message