Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 331BA200D34 for ; Fri, 3 Nov 2017 17:07:38 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3195F160BFB; Fri, 3 Nov 2017 16:07:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 28959160BE9 for ; Fri, 3 Nov 2017 17:07:37 +0100 (CET) Received: (qmail 82520 invoked by uid 500); 3 Nov 2017 16:07:36 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 82509 invoked by uid 99); 3 Nov 2017 16:07:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Nov 2017 16:07:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 4E3211808A5 for ; Fri, 3 Nov 2017 16:07:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.562 X-Spam-Level: ** X-Spam-Status: No, score=2.562 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, KAM_MANYTO=0.2, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id lyxR-4g0oWNJ for ; Fri, 3 Nov 2017 16:07:32 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 5BCAC60EF8 for ; Fri, 3 Nov 2017 16:07:31 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id vA3G7S8J003716; Fri, 3 Nov 2017 16:07:28 GMT Message-Id: <201711031607.vA3G7S8J003716@ip-10-146-233-104.ec2.internal> X-Gerrit-PatchSet: 11 Date: Fri, 3 Nov 2017 16:07:28 +0000 From: "Thomas Tauber-Marshall (Code Review)" To: Michael Ho , Lars Volker , Matthew Jacobs , Tim Armstrong , Todd Lipcon , Mostafa Mokhtar , Alex Behm , impala-cr@cloudera.com, reviews@impala.incubator.apache.org X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4252=3A_Min-max_runtime_filters_for_Kudu=0A?= X-Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c X-Gerrit-Change-Number: 7793 X-Gerrit-ChangeURL: X-Gerrit-Commit: ab76f9ce3e40b022a8691cc5f2e82f7ac515ea96 In-Reply-To: References: Reply-To: tmarshall@cloudera.com, impala-cr@cloudera.com, lv@cloudera.com, marcelk@gmail.com, kwho@cloudera.com, tarmstrong@cloudera.com, todd@apache.org, mmokhtar@cloudera.com, alex.behm@cloudera.com, reviews@impala.incubator.apache.org, mjacobs@apache.org MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.14.2 Content-Type: multipart/alternative; boundary="ERnSFTMB1w0="; charset=UTF-8 archived-at: Fri, 03 Nov 2017 16:07:38 -0000 --ERnSFTMB1w0= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim A= rmstrong, Todd Lipcon, Mostafa Mokhtar, Alex Behm, I'd like you to reexam= ine a change=2E Please visit http://gerrit=2Ecloudera=2Eorg:8080/7793 = to look at the new patch set (#11)=2E Change subject: IMPALA-4252: Min-ma= x runtime filters for Kudu =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E IMPALA-4252: Min-max runtime filters for Kudu This patch imp= lements min-max filters for runtime filters=2E Each runtime filter generate= s a bloom filter or a min-max filter, depending on if it has HDFS or Kudu t= argets, respectively=2E In RuntimeFilterGenerator in the planner, each has= h join node generates a bloom and min-max filter for each equi-join predica= te, but only those filters that end up being assigned to a target make it i= nto the final plan=2E Min-max filters are only assigned to Kudu scans if t= he target expr is a column, as Kudu doesn't support bounds on general exprs= , and only if the join op is '=3D' and not 'is distinct from', as Kudu does= n't support returning NULLs if a bound is set=2E Min-max filters are inser= ted into by the PartitionedHashJoinBuilder=2E Codegen is used to eliminate = branching on the type of filter=2E String min-max filters truncate their bo= unds at 1024 chars, so that the max amount of memory used by min-max filter= s is negligible=2E For now, min-max filters are only applied at the KuduSc= anner, which passes them into the Kudu client=2E Future work will address = applying min-max filters at HDFS scan nodes and applying bloom filters at K= udu scan nodes=2E Functional Testing: - Added new planner tests and update= d the old ones=2E (in old tests, a lot of runtime filters are renumbered = as we always generate min-max filters even if they don't end up getting a= ssigned and they take up some of the RF ids)=2E - Updated existing runtim= e filter tests to work with Kudu=2E - Added e2e tests for min-max filter sp= ecific functionality=2E Perf Testing: - All tests run on Kudu stress clust= er (10 nodes) and tpch_100_kudu, timings are averages of 3 runs=2E - Ran = a contrived query with a filter that does not eliminate any rows (full se= lf join of lineitem)=2E The difference in running time was negligible - 2= 4=2E46s with filters on, 24=2E15s with filters off for a ~1% slowdown=2E = - Ran a contrived query with a filter that elimiates all rows (self join = on lineitem with a join condition that never matches)=2E The filters resu= lted in a significant speedup - 0=2E26s with filters on, 1=2E46s with fil= ters off for a ~5=2E6x speedup=2E This query is added to targeted-perf=2E= Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen= /gen_ir_descriptions=2Epy M be/src/codegen/impala-ir=2Ecc M be/src/exec/fil= ter-context=2Ecc M be/src/exec/filter-context=2Eh M be/src/exec/hdfs-parque= t-scanner-ir=2Ecc M be/src/exec/hdfs-scan-node-base=2Ecc M be/src/exec/kudu= -scan-node-base=2Ecc M be/src/exec/kudu-scan-node-mt=2Ecc M be/src/exec/kud= u-scan-node=2Ecc M be/src/exec/kudu-scanner=2Ecc M be/src/exec/kudu-scanner= =2Eh M be/src/exec/kudu-util=2Ecc M be/src/exec/kudu-util=2Eh M be/src/exec= /partitioned-hash-join-builder-ir=2Ecc M be/src/exec/partitioned-hash-join-= builder=2Ecc M be/src/exec/scan-node=2Ecc M be/src/runtime/coordinator-filt= er-state=2Eh M be/src/runtime/coordinator=2Ecc M be/src/runtime/fragment-in= stance-state=2Ecc M be/src/runtime/fragment-instance-state=2Eh M be/src/run= time/query-state=2Ecc M be/src/runtime/query-state=2Eh M be/src/runtime/run= time-filter-bank=2Ecc M be/src/runtime/runtime-filter-bank=2Eh M be/src/run= time/runtime-filter-ir=2Ecc M be/src/runtime/runtime-filter=2Ecc M be/src/r= untime/runtime-filter=2Eh M be/src/runtime/runtime-filter=2Einline=2Eh M be= /src/runtime/timestamp-value=2Eh M be/src/service/impala-internal-service= =2Ecc M be/src/util/CMakeLists=2Etxt A be/src/util/min-max-filter-ir=2Ecc A= be/src/util/min-max-filter-test=2Ecc A be/src/util/min-max-filter=2Ecc A b= e/src/util/min-max-filter=2Eh M common/thrift/Data=2Ethrift M common/thrift= /ImpalaInternalService=2Ethrift M common/thrift/ImpalaService=2Ethrift M co= mmon/thrift/PlanNodes=2Ethrift M fe/src/main/java/org/apache/impala/planner= /HashJoinNode=2Ejava M fe/src/main/java/org/apache/impala/planner/HdfsScanN= ode=2Ejava M fe/src/main/java/org/apache/impala/planner/KuduScanNode=2Ejava= M fe/src/main/java/org/apache/impala/planner/PlanNode=2Ejava M fe/src/main= /java/org/apache/impala/planner/RuntimeFilterGenerator=2Ejava M fe/src/test= /java/org/apache/impala/planner/PlannerTest=2Ejava M testdata/workloads/fun= ctional-planner/queries/PlannerTest/aggregation=2Etest M testdata/workloads= /functional-planner/queries/PlannerTest/fk-pk-join-detection=2Etest M testd= ata/workloads/functional-planner/queries/PlannerTest/implicit-joins=2Etest = M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-lim= it=2Etest M testdata/workloads/functional-planner/queries/PlannerTest/inlin= e-view=2Etest M testdata/workloads/functional-planner/queries/PlannerTest/j= oin-order=2Etest M testdata/workloads/functional-planner/queries/PlannerTes= t/joins=2Etest M testdata/workloads/functional-planner/queries/PlannerTest/= kudu-delete=2Etest M testdata/workloads/functional-planner/queries/PlannerT= est/kudu-update=2Etest M testdata/workloads/functional-planner/queries/Plan= nerTest/kudu=2Etest M testdata/workloads/functional-planner/queries/Planner= Test/max-row-size=2Etest A testdata/workloads/functional-planner/queries/Pl= annerTest/min-max-runtime-filters=2Etest M testdata/workloads/functional-pl= anner/queries/PlannerTest/nested-collections=2Etest M testdata/workloads/fu= nctional-planner/queries/PlannerTest/order=2Etest M testdata/workloads/func= tional-planner/queries/PlannerTest/outer-joins=2Etest M testdata/workloads/= functional-planner/queries/PlannerTest/predicate-propagation=2Etest M testd= ata/workloads/functional-planner/queries/PlannerTest/resource-requirements= =2Etest M testdata/workloads/functional-planner/queries/PlannerTest/runtime= -filter-propagation=2Etest M testdata/workloads/functional-planner/queries/= PlannerTest/runtime-filter-query-options=2Etest M testdata/workloads/functi= onal-planner/queries/PlannerTest/spillable-buffer-sizing=2Etest M testdata/= workloads/functional-planner/queries/PlannerTest/subquery-rewrite=2Etest M = testdata/workloads/functional-planner/queries/PlannerTest/tablesample=2Etes= t M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all=2Et= est M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all=2E= test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu= =2Etest M testdata/workloads/functional-planner/queries/PlannerTest/tpch-ne= sted=2Etest M testdata/workloads/functional-planner/queries/PlannerTest/tpc= h-views=2Etest M testdata/workloads/functional-planner/queries/PlannerTest/= union=2Etest M testdata/workloads/functional-planner/queries/PlannerTest/vi= ews=2Etest M testdata/workloads/functional-planner/queries/PlannerTest/with= -clause=2Etest A testdata/workloads/functional-query/queries/QueryTest/bloo= m_filters=2Etest A testdata/workloads/functional-query/queries/QueryTest/bl= oom_filters_wait=2Etest M testdata/workloads/functional-query/queries/Query= Test/explain-level2=2Etest M testdata/workloads/functional-query/queries/Qu= eryTest/explain-level3=2Etest A testdata/workloads/functional-query/queries= /QueryTest/min_max_filters=2Etest M testdata/workloads/functional-query/que= ries/QueryTest/runtime_filters=2Etest M testdata/workloads/functional-query= /queries/QueryTest/runtime_filters_wait=2Etest A testdata/workloads/targete= d-perf/queries/primitive_min_max_runtime_filter=2Etest M tests/common/impal= a_test_suite=2Epy M tests/query_test/test_runtime_filters=2Epy M tests/util= /test_file_parser=2Epy 86 files changed, 3,690 insertions(+), 1,647 deletio= ns(-) git pull ssh://gerrit=2Ecloudera=2Eorg:29418/Impala-ASF refs/chan= ges/93/7793/11 -- To view, visit http://gerrit=2Ecloudera=2Eorg:8080/7793 = To unsubscribe, visit http://gerrit=2Ecloudera=2Eorg:8080/settings Gerrit-= Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset G= errit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Nu= mber: 7793 Gerrit-PatchSet: 11 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm = Gerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit= -Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhta= r Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon --ERnSFTMB1w0=--