Return-Path: X-Original-To: apmail-impala-dev-archive@minotaur.apache.org Delivered-To: apmail-impala-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A3DC719B5E for ; Fri, 29 Apr 2016 05:20:12 +0000 (UTC) Received: (qmail 40027 invoked by uid 500); 29 Apr 2016 05:20:12 -0000 Delivered-To: apmail-impala-dev-archive@impala.apache.org Received: (qmail 39985 invoked by uid 500); 29 Apr 2016 05:20:12 -0000 Mailing-List: contact dev-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list dev@impala.incubator.apache.org Received: (qmail 39972 invoked by uid 99); 29 Apr 2016 05:20:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Apr 2016 05:20:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C470E1A06BC for ; Fri, 29 Apr 2016 05:20:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id rsYXAuAGKM2P for ; Fri, 29 Apr 2016 05:20:09 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 772715F241 for ; Fri, 29 Apr 2016 05:20:09 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id u3T5K8Sf026706; Fri, 29 Apr 2016 05:20:08 GMT Message-Id: <201604290520.u3T5K8Sf026706@ip-10-146-233-104.ec2.internal> Date: Fri, 29 Apr 2016 05:20:00 +0000 From: "Henry Robinson (Code Review)" To: impala-cr@cloudera.com, dev@impala.incubator.apache.org CC: Marcel Kornacker , Mostafa Mokhtar Reply-To: henry@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?[Impala-CR](cdh5-trunk)_IMPALA-3007:_Adjust_Bloom_Filter_size_according_to_NDV_estimate=0A?= X-Gerrit-Change-Id: I1fe37b8d4cfb3c52bb8e8cf0ca55e92665b87803 X-Gerrit-ChangeURL: X-Gerrit-Commit: 603c1e6cbe850ffa1b8a6e2812c3a48478b1f664 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.10-rc0 Henry Robinson has uploaded a new patch set (#7). Change subject: IMPALA-3007: Adjust Bloom Filter size according to NDV estimate ...................................................................... IMPALA-3007: Adjust Bloom Filter size according to NDV estimate Instead of having a default Bloom Filter size for all runtime filters, adjust filter size according to desired FP-rate and expected NDV from join's build-side. Size of filter is still clipped to 4k < N < 16MB range. If NDV estimate from planner is -1 (i.e. no stats) the default filter size is used. The NDV of all filters produced by the same join is currently the same because the NDV is estimated from the cardinality of the input. In the future, the NDV should be estimated for each filter source expr. The BE changes anticipate this and can enable or disable individual filters if they have differing FP rates. Change-Id: I1fe37b8d4cfb3c52bb8e8cf0ca55e92665b87803 --- M be/src/exec/hash-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/old-hash-table.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/util/bloom-filter-test.cc M be/src/util/bloom-filter.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java M fe/src/main/java/com/cloudera/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_filters_wait.test M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test 16 files changed, 230 insertions(+), 103 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/12/2812/7 -- To view, visit http://gerrit.cloudera.org:8080/2812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1fe37b8d4cfb3c52bb8e8cf0ca55e92665b87803 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar