hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Behm (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15653) Some ALTER TABLE commands drop table stats
Date Thu, 19 Jan 2017 23:01:26 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830771#comment-15830771
] 

Alexander Behm commented on HIVE-15653:
---------------------------------------

Good to know thanks. Regarding ALTER TABLE SET LOCATION: I suppose it's arguable. Imo, we
should not add side-effects to ALTER TABLE especially when it comes to stats because they
are expensive to compute. This is more of a product question, so maybe [~grahn] or [~skumar]
can weigh in.

> Some ALTER TABLE commands drop table stats
> ------------------------------------------
>
>                 Key: HIVE-15653
>                 URL: https://issues.apache.org/jira/browse/HIVE-15653
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 1.1.0
>            Reporter: Alexander Behm
>            Assignee: Chaoyu Tang
>            Priority: Critical
>         Attachments: HIVE-15653.patch
>
>
> Some ALTER TABLE commands drop the table stats. That may make sense for some ALTER TABLE
operations, but certainly not for others. Personally, I I think ALTER TABLE should only change
what was requested by the user without any side effects that may be unclear to users. In particular,
collecting stats can be an expensive operation so it's rather inconvenient for users if they
get wiped accidentally.
> Repro:
> {code}
> create table t (i int);
> insert into t values(1);
> analyze table t compute statistics;
> alter table t set tblproperties('test'='test');
> hive> describe formatted t;
> OK
> # col_name            	data_type           	comment             
> 	 	 
> i                   	int                 	                    
> 	 	 
> # Detailed Table Information	 	 
> Database:           	default             	 
> Owner:              	abehm               	 
> CreateTime:         	Tue Jan 17 18:13:34 PST 2017	 
> LastAccessTime:     	UNKNOWN             	 
> Protect Mode:       	None                	 
> Retention:          	0                   	 
> Location:           	hdfs://localhost:20500/test-warehouse/t	 
> Table Type:         	MANAGED_TABLE       	 
> Table Parameters:	 	 
> 	COLUMN_STATS_ACCURATE	false               
> 	last_modified_by    	abehm               
> 	last_modified_time  	1484705748          
> 	numFiles            	1                   
> 	numRows             	-1                  
> 	rawDataSize         	-1                  
> 	test                	test                
> 	totalSize           	2                   
> 	transient_lastDdlTime	1484705748          
> 	 	 
> # Storage Information	 	 
> SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
> InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
> OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
> Compressed:         	No                  	 
> Num Buckets:        	-1                  	 
> Bucket Columns:     	[]                  	 
> Sort Columns:       	[]                  	 
> Storage Desc Params:	 	 
> 	serialization.format	1                   
> Time taken: 0.169 seconds, Fetched: 34 row(s)
> {code}
> The same behavior can be observed with several other ALTER TABLE commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message