hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chaoyu Tang <ctang...@gmail.com>
Subject Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats
Date Tue, 24 Jan 2017 04:58:25 GMT


> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> >

XiongPeng, I updated the 2nd patches and also replied you comments, please take a look to
see if they make sense. Thanks


> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3645
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609803#file1609803line3645>
> >
> >     If it is for user update stats, you will have     StatsSetupConst.STATS_GENERATED
= StatsSetupConst.USER automatically. Thus it is not necessary to have/call hasStatsInParameters
function.
> 
> Chaoyu Tang wrote:
>     Thanks PengCheng for reviewing the patch. I looked through the code related to this
STATS status and feel a little confusing, here are some questions I need help from you. To
my understanding:
>     1. For all operations which set STATS_GENERATED to TASK, we should set BASIC_STATS
as TRUE, right?
>     2. But for the stats row_count, raw_data_size updated by the command alter table
.. update statistics, the STATS_GENERATED is USER, should we set BASIC_STATS as TRUE or FALSE?
>     3. These stats could sometimes be set as parameters using command alter table ..
settblproperties, in these cases, the BASIC_STATS should be set FALSE?
> 
> pengcheng xiong wrote:
>     1. yes. what i mean here is that, you can just check if STATS_GENERATED==StatsSetupConst.USER
to determine whether it hasStatsInParameters. 
>     2. set it to FALSE. But note that when we call "StatsSetupConst.setBasicStatsState(params,
StatsSetupConst.FALSE);" we just remove the entry for "COLUMN_STATS_ACCURATE". Thus, set it
to false is equal to remove the entry.
>     3. yes. it should be set to FALSE. And as a result, there is no entry for "COLUMN_STATS_ACCURATE"

In item 3, these stats might include row_count, raw_data_size, numFiles, totalSize etc.


> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3657
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609803#file1609803line3657>
> >
> >     And, could u just remove the property rather than set it to false? We treat
the missing of the property the same as set it to false.
> 
> Chaoyu Tang wrote:
>     alterTableOrSinglePartition could be called in a loop in the method alterTable(Hive
db, AlterTableDesc alterTbl) (see DDLTask line 3408). Setting it to false could avoid it to
be reset to true in the beginning of the method, I wonder if it makes sense.
> 
> pengcheng xiong wrote:
>     Yes, i understand and I agree with the logic that you want to do. My point is that,
rather than put <DO_NOT_UPDATE_STATS, FALSE> entry, we can just remove the key <DO_NOT_UPDATE_STATS>.
We will check whether the property contains that key later. See Line 219 in MetaStoreUtils.java.

removing the key <DO_NOT_UPDATE_STATS> instead of setting it to FALSE.


> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/test/queries/clientpositive/alter_table_stats_status.q, line 1
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609804#file1609804line1>
> >
> >     Could u add some more test cases for partition stats? Thanks.
> 
> Chaoyu Tang wrote:
>     Will provide soon in next patch after testing.

Add the tests for partition stats.


> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3655
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609803#file1609803line3655>
> >
> >     The same as above

See comments in code:
      // STATS_GENERATED==StatsSetupConst.USER is set only when the stats row_count, raw_data_size
      // are updated using alter table .. update statistics .. But the stats parameters including
      // totalSize etc could also be set using command like alter table .. set tblproperties,
in this
      // case, STATS_GENERATED will not be set, so we check hasStatsInParameters(alterTbl.getProps())
      // here which also covered the case were STATS_GENERATED==StatsSetupConst.USER
To see if it makes sense to use hasStatsInParameters instead of only checking STATS_GENERATED==StatsSetupConst.USER


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/#review162357
-----------------------------------------------------------


On Jan. 19, 2017, 10:29 p.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55731/
> -----------------------------------------------------------
> 
> (Updated Jan. 19, 2017, 10:29 p.m.)
> 
> 
> Review request for hive and pengcheng xiong.
> 
> 
> Bugs: HIVE-15653
>     https://issues.apache.org/jira/browse/HIVE-15653
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For most of alter table operations like table rename, add columns, change column type
etc (besides the set table properties), the table stats status should not change. But for
some other operations like update statistics, change location, the basic stats status should
change.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
>   ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55731/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests
> 2. new unit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message