Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F2FC9200D26 for ; Fri, 6 Oct 2017 06:20:28 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id F176C160BDA; Fri, 6 Oct 2017 04:20:28 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 197CB1609E2 for ; Fri, 6 Oct 2017 06:20:27 +0200 (CEST) Received: (qmail 30609 invoked by uid 500); 6 Oct 2017 04:20:27 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 30426 invoked by uid 99); 6 Oct 2017 04:20:26 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Oct 2017 04:20:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D7C5E1A188A for ; Fri, 6 Oct 2017 04:20:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.362 X-Spam-Level: ** X-Spam-Status: No, score=2.362 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ALWnbB_zhBIG for ; Fri, 6 Oct 2017 04:20:23 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 2607D5FC80 for ; Fri, 6 Oct 2017 04:20:23 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v964KKZ7014581; Fri, 6 Oct 2017 04:20:20 GMT Message-Id: <201710060420.v964KKZ7014581@ip-10-146-233-104.ec2.internal> X-Gerrit-PatchSet: 1 Date: Fri, 6 Oct 2017 04:20:20 +0000 From: "Alex Behm (Code Review)" To: John Russell , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Silvius Rus , Mostafa Mokhtar , Greg Rahn X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_=5BDOCS=5D_Tighten_up_advice_about_first_COMPUTE_INCREMENTAL_STATS=0A?= X-Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03 X-Gerrit-Change-Number: 7999 X-Gerrit-ChangeURL: X-Gerrit-Commit: 8a9e26510f30872c6ae3fe20ce294f385d3972e2 In-Reply-To: References: X-Gerrit-Comment-Date: Fri, 6 Oct 2017 04:20:20 +0000 Reply-To: alex.behm@cloudera.com, jrussell@cloudera.com, impala-cr@cloudera.com, marcelk@gmail.com, srus@cloudera.com, mmokhtar@cloudera.com, reviews@impala.incubator.apache.org, grahn@cloudera.com MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.14.2 Content-Type: multipart/alternative; boundary="3hTa2t8mnQw="; charset=UTF-8 archived-at: Fri, 06 Oct 2017 04:20:29 -0000 --3hTa2t8mnQw= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Alex Behm has posted comments on this change=2E ( http://gerrit=2Ecloudera= =2Eorg:8080/7999 ) Change subject: [DOCS] Tighten up advice about first CO= MPUTE INCREMENTAL STATS =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E Patch Set 1: (4 comments) http://gerrit=2Ecloudera=2Eorg:8080= /#/c/7999/1/docs/shared/impala_common=2Exml File docs/shared/impala_common= =2Exml: http://gerrit=2Ecloudera=2Eorg:8080/#/c/7999/1/docs/shared/impala_= common=2Exml@1226 PS1, Line 1226: and the statistics are computed a= gain from the beginning=2E Therefore, expect a one-time from scratch http= ://gerrit=2Ecloudera=2Eorg:8080/#/c/7999/1/docs/shared/impala_common=2Exml@= 1241 PS1, Line 1241: -- by -1 under #Rows and false under Incremental stats= =2E I suggest you leave out the -1 under #Rows part since that may be confu= sing=2E The reason is that DROP INCREMENTAL STATS will *not* modify the #Ro= ws=2E Here's how you can think about incremental stats: COMPUTE INCREMENTA= L STATS populates the "regular" stats such as the #rows and column NDVs tha= t COMPUTE STATS also does, but in addition it also stores "incremental stat= s" to speed up the next COMPUTE INCREMENTAL STATS=2E So the "incremental" p= art is really this extra information which you can drop separately from the= "regular" stats=2E One nice thing is that you can safely DROP INCREMENTAL= STATS everywhere to reduce the size of table metadata without impacting qu= ery plans because the "regular" stats are preserved=2E http://gerrit=2Ecl= oudera=2Eorg:8080/#/c/7999/1/docs/topics/impala_partitioning=2Exml File doc= s/topics/impala_partitioning=2Exml: http://gerrit=2Ecloudera=2Eorg:8080/#/= c/7999/1/docs/topics/impala_partitioning=2Exml@611 PS1, Line 611: B= ecause the COMPUTE STATS statement can be resource-intensi= ve to run frequently This advice isn't prescriptive enough for my taste=2E = We should state very clearly that you should use either COMPUTE STATS xor C= OMPUTE INCREMENTAL STATS but never both=2E Switching during the lifetime of= a table is *not* recommended, but if you really must do so then we recomme= nd you first drop all stats before the switch (using DROP STATS and DROP IN= CREMENTAL STATS)=2E http://gerrit=2Ecloudera=2Eorg:8080/#/c/7999/1/docs/t= opics/impala_partitioning=2Exml@613 PS1, Line 613: that is optimize= d for processing partitioned tables=2E I wouldn't say that incremental stat= s is "optimized" for partitioned tables=2E Foremost, incremental stats allo= w you to compute stats in a partition-by-partition fashion which might be a= better fit for a user's data ingestion pattern=2E However, we should be ve= ry clear about the cost of incremental stats=2E Incremental stats need ~400= bytes per column per partition in the table metadata (which gets disseminat= ed and cached everywhere), so incremental stats it not a good fit for table= s with a huge number of columns and partitions=2E If you have a partitioned= table and only a few of the partitions are "active" then you can compute i= ncremental stats for new partitions coming in and drop incremental stats fo= r those partitions "phased" out to limit your exposure to the metadata size= problems=2E You can even state that the huge table metadata can crash the= catalog and/or impalads due to the Java 2GB array size limit=2E (We're wor= king on fixing that) Basically I want to be sure that users understand the= cost of incremental stats and the impact (crash) of when they go overboard= with incremental stats=2E There is no graceful degradation here=2E -- = To view, visit http://gerrit=2Ecloudera=2Eorg:8080/7999 To unsubscribe, vis= it http://gerrit=2Ecloudera=2Eorg:8080/settings Gerrit-Project: Impala-ASF= Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia53a6= 518ce5541e5c9a2cd896856ce042a599b03 Gerrit-Change-Number: 7999 Gerrit-Patch= Set: 1 Gerrit-Owner: John Russell Gerrit-Reviewer= : Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Mostafa Mokhtar = Gerrit-Reviewer: Silvius Rus Gerrit-Comment-Date: Fri= , 06 Oct 2017 04:20:20 +0000 Gerrit-HasComments: Yes --3hTa2t8mnQw=--