hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pengcheng xiong <pxi...@hortonworks.com>
Subject Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats
Date Tue, 24 Jan 2017 19:07:25 GMT


> On Jan. 24, 2017, 5:58 a.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3650
> > <https://reviews.apache.org/r/55731/diff/3/?file=1613301#file1613301line3650>
> >
> >     Thanks Chaoyu for the patch and comments. I still think we do not need the function
hasStatsInParameters. We only care about row_count and raw_data_size, especially row_count.
We use row_count to answer some query directly from metastore. The other parameters, totalSize
etc do not matter. The other part of the patch looks good to me. Thanks.
> 
> Chaoyu Tang wrote:
>     Thanks, PengCheng. In current code, command "alter table .. set tblproperty" can
also set the stats including row_count, raw_data_size, without setting the flag STATS_GENERATED
to USER  like it does for "alter table .. update statistics" in DDLSemanticAnalyzer. So in
this case, we need a way in DDLTask alterTableOrSinglePartition to determine whether this
property change is related to the stats. I used hasStatsInParameters, though it also include
the change of numFiles & totalSize into consideration.
>     I quite do not understand, if we only care about row_count/raw_data_size, why do
we recalcualte the fastStats (numFiles/totalSize) in HMS in a lot of cases?
>     Also I want to confirm that currently COLUMN_STATS_ACCURATE and BASIC_STATS only
reflect the stats accuracy of row_count and raw_data_size? If so, is there any use of numFiles/totalSize?

Thanks Chaoyu. You raised very good questions. (1) Although "alter table .. set tblproperty"
can also set the stats including row_count, in current code, it will remove basic stats =
true anyway. Thus it is safe. Now, "DO_NOT_UPDATE_STATS" is introduced in the patch. I think
we can move (not remove) the function that you introduced to rewrite L1364 to L1392 in DDLSemanticAnalyzer.
That is, if a user is going to change row_count, raw_data_size, no matter where it comes from,
set tblproperty or update statistics, we will set it to be coming from the USER. Later, we
can check it in alterTableOrSinglePartition. What do you think about this idea? (2) totalSize
etc are only useful when row_count/raw_data_size are not available. For more details, you
can refer to L156-L173. And, we only use row_count in StatsOptimizer. That is why "COLUMN_STATS_ACCURATE
and BASIC_STATS only reflect the stats accuracy of row_count and raw_data_size" (actually,
it should be only row_count because we only use 
 row_count in StatsOptimizer). Thanks!


- pengcheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/#review162771
-----------------------------------------------------------


On Jan. 24, 2017, 5:01 a.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55731/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 5:01 a.m.)
> 
> 
> Review request for hive and pengcheng xiong.
> 
> 
> Bugs: HIVE-15653
>     https://issues.apache.org/jira/browse/HIVE-15653
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For most of alter table operations like table rename, add columns, change column type
etc (besides the set table properties), the table stats status should not change. But for
some other operations like update statistics, change location, the basic stats status should
change.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 4aea152 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0f472e7 
>   ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55731/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests
> 2. new unit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message