spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-17581) Invalidate Statistics After Some ALTER TABLE Commands
Date Sun, 18 Sep 2016 06:37:20 GMT

     [ https://issues.apache.org/jira/browse/SPARK-17581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-17581:
------------------------------------

    Assignee:     (was: Apache Spark)

> Invalidate Statistics After Some ALTER TABLE Commands
> -----------------------------------------------------
>
>                 Key: SPARK-17581
>                 URL: https://issues.apache.org/jira/browse/SPARK-17581
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Xiao Li
>
> In the recent statistics-related work, our focus is on how to generate and store the
statistics. After `Analyze Table` commands, the statistics will not be changed unless users
run the command again. However, Hive behaves differently. For example, `ALTER TABLE SET LOCATION`
will invalidate the statistics, including `numRows` and `rawDataSize`.
> {noformat}
> hive> describe formatted t2;
> ...
> Location:           	hdfs://6b68a24121f4:9000/user/hive/warehouse/t2	 
> Table Type:         	MANAGED_TABLE       	 
> Table Parameters:	 	 
> 	COLUMN_STATS_ACCURATE	true                
> 	numFiles            	4                   
> 	numRows             	2                   
> 	rawDataSize         	2                   
> 	totalSize           	4                   
> 	transient_lastDdlTime	1464590855          
> ...
> {noformat}
> {noformat}
> hive> alter table t2 set location 'hdfs://6b68a24121f4:9000/user/hive/warehouse/t1';
> OK
> Time taken: 0.113 seconds
> {noformat}
> {noformat}
> hive> describe formatted t2;
> ...                	 
> Location:           	hdfs://6b68a24121f4:9000/user/hive/warehouse/t1	 
> Table Type:         	MANAGED_TABLE       	 
> Table Parameters:	 	 
> 	COLUMN_STATS_ACCURATE	false               
> 	last_modified_by    	root                
> 	last_modified_time  	1474178025          
> 	numFiles            	4                   
> 	numRows             	-1                  
> 	rawDataSize         	-1                  
> 	totalSize           	4                   
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message