Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 561EF200D4A for ; Tue, 28 Nov 2017 23:47:03 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5465B160C07; Tue, 28 Nov 2017 22:47:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 74229160BE7 for ; Tue, 28 Nov 2017 23:47:02 +0100 (CET) Received: (qmail 28083 invoked by uid 500); 28 Nov 2017 22:47:01 -0000 Mailing-List: contact reviews-help@impala.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.apache.org Received: (qmail 28072 invoked by uid 99); 28 Nov 2017 22:47:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Nov 2017 22:47:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A346F1A13FA for ; Tue, 28 Nov 2017 22:47:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.362 X-Spam-Level: ** X-Spam-Status: No, score=2.362 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Sd_SK1h02aPO for ; Tue, 28 Nov 2017 22:46:58 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 043135F21E for ; Tue, 28 Nov 2017 22:46:57 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id vASMku1Z017723; Tue, 28 Nov 2017 22:46:56 GMT Message-Id: <201711282246.vASMku1Z017723@ip-10-146-233-104.ec2.internal> X-Gerrit-PatchSet: 4 Date: Tue, 28 Nov 2017 22:46:55 +0000 From: "Alex Behm (Code Review)" To: Dimitris Tsirogiannis , Balazs Jeszenszky , Vuk Ercegovac , impala-cr@cloudera.com, reviews@impala.incubator.apache.org X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5310=3A_Add_COMPUTE_STATS_TABLESAMPLE=2E=0A?= X-Gerrit-Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7 X-Gerrit-Change-Number: 8136 X-Gerrit-ChangeURL: X-Gerrit-Commit: 34143874d1d8b2871f501795b2505551dc420b2f In-Reply-To: References: Reply-To: alex.behm@cloudera.com, impala-cr@cloudera.com, marcelk@gmail.com, reviews@impala.incubator.apache.org, dtsirogiannis@cloudera.com, jeszyb@gmail.com, vercegovac@cloudera.com MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.14.2 Content-Type: multipart/alternative; boundary="5g4b2hawsLI="; charset=UTF-8 archived-at: Tue, 28 Nov 2017 22:47:03 -0000 --5g4b2hawsLI= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello Dimitris Tsirogiannis, Balazs Jeszenszky, Vuk Ercegovac, I'd like y= ou to reexamine a change=2E Please visit http://gerrit=2Ecloudera=2Eor= g:8080/8136 to look at the new patch set (#4)=2E Change subject: IMPALA-5= 310: Add COMPUTE STATS TABLESAMPLE=2E =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E IMPALA-5310: Add COMPUTE STATS TABLESAMPLE=2E Ad= ds the TABLESAMPLE clause for COMPUTE STATS=2E Syntax: COMPUTE STATS TABLESAMPLE SYSTEM() [REPEATABLE()] Computes and replac= es the table-level row count and total file size, as well as all table-leve= l column statistics=2E Existing partition-level row counts are not modified= =2E The TABLESAMPLE clause can be used to limit the scanned data volume to = a desired percentage=2E When sampling, the unmodified results of the COMPUT= E STATS queries are sent to the CatalogServer=2E There, the stats are extra= polated before storing them into the HMS so as not to confuse other engines= like Hive/SparkSQL which may rely on the shared HMS fields being accurate= =2E Limitations - Only works for HDFS tables - TABLESAMPLE is not supporte= d for COMPUTE INCREMENTAL STATS - TABLESAMPLE requires --enable_stats_extra= polation=3Dtrue Changes to EXPLAIN The stored statistics from the HMS are = more clearly displayed under a 'stored statistics' section=2E Example: 00:= SCAN HDFS [functional=2Ealltypes, RANDOM] partitions=3D24/24 files=3D24 = size=3D478=2E45KB stored statistics: table: rows=3D7300 size=3D478= =2E45KB partitions: 24/24 rows=3D7300 columns: all Testing: - ad= ded new functional tests - core/hdfs run passed Change-Id: I7f3e72471ac563= adada4a4156033a85852b7c8b7 --- M be/src/exec/catalog-op-executor=2Ecc M com= mon/thrift/JniCatalog=2Ethrift M fe/src/main/cup/sql-parser=2Ecup M fe/src/= main/java/org/apache/impala/analysis/ComputeStatsStmt=2Ejava M fe/src/main/= java/org/apache/impala/analysis/TableRef=2Ejava M fe/src/main/java/org/apac= he/impala/analysis/TableSampleClause=2Ejava M fe/src/main/java/org/apache/i= mpala/catalog/ColumnStats=2Ejava M fe/src/main/java/org/apache/impala/catal= og/HdfsTable=2Ejava M fe/src/main/java/org/apache/impala/hive/executor/UdfE= xecutor=2Ejava M fe/src/main/java/org/apache/impala/planner/DataSourceScanN= ode=2Ejava M fe/src/main/java/org/apache/impala/planner/HBaseScanNode=2Ejav= a M fe/src/main/java/org/apache/impala/planner/HdfsScanNode=2Ejava M fe/src= /main/java/org/apache/impala/planner/ScanNode=2Ejava M fe/src/main/java/org= /apache/impala/service/CatalogOpExecutor=2Ejava M fe/src/test/java/org/apac= he/impala/analysis/AnalyzeDDLTest=2Ejava M fe/src/test/java/org/apache/impa= la/analysis/AnalyzeStmtsTest=2Ejava M fe/src/test/java/org/apache/impala/an= alysis/ParserTest=2Ejava M fe/src/test/java/org/apache/impala/hive/executor= /UdfExecutorTest=2Ejava M testdata/workloads/functional-planner/queries/Pla= nnerTest/constant-folding=2Etest M testdata/workloads/functional-planner/qu= eries/PlannerTest/fk-pk-join-detection=2Etest M testdata/workloads/function= al-planner/queries/PlannerTest/max-row-size=2Etest M testdata/workloads/fun= ctional-planner/queries/PlannerTest/mt-dop-validation=2Etest M testdata/wor= kloads/functional-planner/queries/PlannerTest/parquet-filtering=2Etest M te= stdata/workloads/functional-planner/queries/PlannerTest/partition-pruning= =2Etest M testdata/workloads/functional-planner/queries/PlannerTest/resourc= e-requirements=2Etest M testdata/workloads/functional-planner/queries/Plann= erTest/sort-expr-materialization=2Etest M testdata/workloads/functional-pla= nner/queries/PlannerTest/spillable-buffer-sizing=2Etest M testdata/workload= s/functional-planner/queries/PlannerTest/tablesample=2Etest M testdata/work= loads/functional-query/queries/QueryTest/alter-table-set-column-stats=2Etes= t M testdata/workloads/functional-query/queries/QueryTest/compute-stats-inc= remental=2Etest A testdata/workloads/functional-query/queries/QueryTest/com= pute-stats-tablesample=2Etest M testdata/workloads/functional-query/queries= /QueryTest/compute-stats=2Etest M testdata/workloads/functional-query/queri= es/QueryTest/explain-level2=2Etest M testdata/workloads/functional-query/qu= eries/QueryTest/explain-level3=2Etest M testdata/workloads/functional-query= /queries/QueryTest/hbase-compute-stats-incremental=2Etest M testdata/worklo= ads/functional-query/queries/QueryTest/hbase-compute-stats=2Etest M testdat= a/workloads/functional-query/queries/QueryTest/show-stats=2Etest M testdata= /workloads/functional-query/queries/QueryTest/stats-extrapolation=2Etest M = testdata/workloads/functional-query/queries/QueryTest/truncate-table=2Etest= M tests/custom_cluster/test_stats_extrapolation=2Epy M tests/metadata/test= _explain=2Epy 41 files changed, 1,928 insertions(+), 1,235 deletions(-) = git pull ssh://gerrit=2Ecloudera=2Eorg:29418/Impala-ASF refs/changes/36/81= 36/4 -- To view, visit http://gerrit=2Ecloudera=2Eorg:8080/8136 To unsubsc= ribe, visit http://gerrit=2Ecloudera=2Eorg:8080/settings Gerrit-Project: I= mpala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Chan= ge-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7 Gerrit-Change-Number: 8136= Gerrit-PatchSet: 4 Gerrit-Owner: Alex Behm Ge= rrit-Reviewer: Alex Behm Gerrit-Reviewer: Bala= zs Jeszenszky Gerrit-Reviewer: Dimitris Tsirogiannis <= dtsirogiannis@cloudera=2Ecom> Gerrit-Reviewer: Vuk Ercegovac --5g4b2hawsLI=--