Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 21A1D200C04 for ; Tue, 24 Jan 2017 20:07:29 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 2021A160B38; Tue, 24 Jan 2017 19:07:29 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 67B0C160B3E for ; Tue, 24 Jan 2017 20:07:28 +0100 (CET) Received: (qmail 23688 invoked by uid 500); 24 Jan 2017 19:07:27 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 23480 invoked by uid 99); 24 Jan 2017 19:07:26 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jan 2017 19:07:26 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 4CE3E316F32; Tue, 24 Jan 2017 19:07:25 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============3397931229848746246==" MIME-Version: 1.0 Subject: Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats From: pengcheng xiong To: pengcheng xiong Cc: hive , Chaoyu Tang Date: Tue, 24 Jan 2017 19:07:25 -0000 Message-ID: <20170124190725.13409.90744@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: pengcheng xiong X-ReviewGroup: hive X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/55731/ X-Sender: pengcheng xiong References: <20170124055855.13408.16020@reviews.apache.org> In-Reply-To: <20170124055855.13408.16020@reviews.apache.org> X-ReviewBoard-Diff-For: ql/src/test/results/clientpositive/alter_table_stats_status.q.out X-ReviewBoard-Diff-For: ql/src/test/queries/clientpositive/alter_table_stats_status.q Reply-To: pengcheng xiong X-ReviewRequest-Repository: hive-git archived-at: Tue, 24 Jan 2017 19:07:29 -0000 --===============3397931229848746246== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Jan. 24, 2017, 5:58 a.m., pengcheng xiong wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3650 > > > > > > Thanks Chaoyu for the patch and comments. I still think we do not need the function hasStatsInParameters. We only care about row_count and raw_data_size, especially row_count. We use row_count to answer some query directly from metastore. The other parameters, totalSize etc do not matter. The other part of the patch looks good to me. Thanks. > > Chaoyu Tang wrote: > Thanks, PengCheng. In current code, command "alter table .. set tblproperty" can also set the stats including row_count, raw_data_size, without setting the flag STATS_GENERATED to USER like it does for "alter table .. update statistics" in DDLSemanticAnalyzer. So in this case, we need a way in DDLTask alterTableOrSinglePartition to determine whether this property change is related to the stats. I used hasStatsInParameters, though it also include the change of numFiles & totalSize into consideration. > I quite do not understand, if we only care about row_count/raw_data_size, why do we recalcualte the fastStats (numFiles/totalSize) in HMS in a lot of cases? > Also I want to confirm that currently COLUMN_STATS_ACCURATE and BASIC_STATS only reflect the stats accuracy of row_count and raw_data_size? If so, is there any use of numFiles/totalSize? Thanks Chaoyu. You raised very good questions. (1) Although "alter table .. set tblproperty" can also set the stats including row_count, in current code, it will remove basic stats = true anyway. Thus it is safe. Now, "DO_NOT_UPDATE_STATS" is introduced in the patch. I think we can move (not remove) the function that you introduced to rewrite L1364 to L1392 in DDLSemanticAnalyzer. That is, if a user is going to change row_count, raw_data_size, no matter where it comes from, set tblproperty or update statistics, we will set it to be coming from the USER. Later, we can check it in alterTableOrSinglePartition. What do you think about this idea? (2) totalSize etc are only useful when row_count/raw_data_size are not available. For more details, you can refer to L156-L173. And, we only use row_count in StatsOptimizer. That is why "COLUMN_STATS_ACCURATE and BASIC_STATS only reflect the stats accuracy of row_count and raw_data_size" (actually, it should be only row_count because we only use row_count in StatsOptimizer). Thanks! - pengcheng ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55731/#review162771 ----------------------------------------------------------- On Jan. 24, 2017, 5:01 a.m., Chaoyu Tang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55731/ > ----------------------------------------------------------- > > (Updated Jan. 24, 2017, 5:01 a.m.) > > > Review request for hive and pengcheng xiong. > > > Bugs: HIVE-15653 > https://issues.apache.org/jira/browse/HIVE-15653 > > > Repository: hive-git > > > Description > ------- > > For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change. > > > Diffs > ----- > > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 4aea152 > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 > ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0f472e7 > ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION > ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION > > Diff: https://reviews.apache.org/r/55731/diff/ > > > Testing > ------- > > 1. Manual tests > 2. new unit tests > > > Thanks, > > Chaoyu Tang > > --===============3397931229848746246==--