hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaoyu Tang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
Date Mon, 24 Apr 2017 22:31:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982021#comment-15982021
] 

Chaoyu Tang commented on HIVE-16147:
------------------------------------

Patch has been uploaded to RB. [~pxiong], could you help to review it. Thanks.

> Rename a partitioned table should not drop its partition columns stats
> ----------------------------------------------------------------------
>
>                 Key: HIVE-16147
>                 URL: https://issues.apache.org/jira/browse/HIVE-16147
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-16147.patch
>
>
> When a partitioned table (e.g. sample_pt) is renamed (e.g to sample_pt_rename), describing
its partition shows that the partition column stats are still accurate, but actually they
all have been dropped.
> It could be reproduce as following:
> 1. analyze table sample_pt compute statistics for columns;
> 2. describe formatted default.sample_pt partition (dummy = 3):  COLUMN_STATS for all
columns are true
> {code}
> ...
> # Detailed Partition Information	 	 
> Partition Value:    	[3]                 	 
> Database:           	default             	 
> Table:              	sample_pt           	 
> CreateTime:         	Fri Jan 20 15:42:30 EST 2017	 
> LastAccessTime:     	UNKNOWN             	 
> Location:           	file:/user/hive/warehouse/apache/sample_pt/dummy=3
> Partition Parameters:	 	 
> 	COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
> 	last_modified_by    	ctang               
> 	last_modified_time  	1485217063          
> 	numFiles            	1                   
> 	numRows             	100                 
> 	rawDataSize         	5143                
> 	totalSize           	5243                
> 	transient_lastDdlTime	1488842358    
> ... 
> {code}
> 3: describe formatted default.sample_pt partition (dummy = 3) salary: column stats exists
> {code}
> # col_name            	data_type           	min                 	max                
	num_nulls           	distinct_count      	avg_col_len         	max_col_len         	num_trues
          	num_falses          	comment             
> 	 	 	 	 	 	 	 	 	 	 
> salary              	int                 	1                   	151370              	0
                  	94                  	                    	                    	       
            	                    	from deserializer 
> {code}
> 4. alter table sample_pt rename to sample_pt_rename;
> 5. describe formatted default.sample_pt_rename partition (dummy = 3): describe the rename
table partition (dummy =3) shows that COLUMN_STATS for columns are still true.
> {code}
> # Detailed Partition Information	 	 
> Partition Value:    	[3]                 	 
> Database:           	default             	 
> Table:              	sample_pt_rename    	 
> CreateTime:         	Fri Jan 20 15:42:30 EST 2017	 
> LastAccessTime:     	UNKNOWN             	 
> Location:           	file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3	 
> Partition Parameters:	 	 
> 	COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
> 	last_modified_by    	ctang               
> 	last_modified_time  	1485217063          
> 	numFiles            	1                   
> 	numRows             	100                 
> 	rawDataSize         	5143                
> 	totalSize           	5243                
> 	transient_lastDdlTime	1488842358  
> {code}
> describe formatted default.sample_pt_rename partition (dummy = 3) salary: the column
stats have been dropped.
> {code}
> # col_name            	data_type           	comment             	 	 	 	 	 	 	 	 
> 	 	 	 	 	 	 	 	 	 	 
> salary              	int                 	from deserializer   	 	 	 	 	 	 	 	 
> Time taken: 0.131 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message