Return-Path: X-Original-To: apmail-impala-dev-archive@minotaur.apache.org Delivered-To: apmail-impala-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 954C6196F7 for ; Tue, 19 Apr 2016 05:25:46 +0000 (UTC) Received: (qmail 27160 invoked by uid 500); 19 Apr 2016 05:25:46 -0000 Delivered-To: apmail-impala-dev-archive@impala.apache.org Received: (qmail 27118 invoked by uid 500); 19 Apr 2016 05:25:46 -0000 Mailing-List: contact dev-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list dev@impala.incubator.apache.org Received: (qmail 27063 invoked by uid 99); 19 Apr 2016 05:25:46 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Apr 2016 05:25:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 3576AC0BC3 for ; Tue, 19 Apr 2016 05:25:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id cOuBwg_vq2qb for ; Tue, 19 Apr 2016 05:25:42 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id BA0115F239 for ; Tue, 19 Apr 2016 05:25:42 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id u3J5PZ5e019003; Tue, 19 Apr 2016 05:25:35 GMT Message-Id: <201604190525.u3J5PZ5e019003@ip-10-146-233-104.ec2.internal> Date: Tue, 19 Apr 2016 05:25:35 +0000 From: "Henry Robinson (Code Review)" To: Tim Armstrong , impala-cr@cloudera.com, dev@impala.incubator.apache.org CC: Dan Hecht Reply-To: henry@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?[Impala-CR](cdh5-trunk)_IMPALA-3077:_Enable_runtime_filters_when_PHJ_spills=0A?= X-Gerrit-Change-Id: I59a2d9ee03ccea6b674392584e4c7f272233571e X-Gerrit-ChangeURL: X-Gerrit-Commit: a10233bd94eba04e4f7432a5d4a77cf9e928ccf8 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.10-rc0 Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/2783 to look at the new patch set (#6). Change subject: IMPALA-3077: Enable runtime filters when PHJ spills ...................................................................... IMPALA-3077: Enable runtime filters when PHJ spills This patch changes when runtime filters are produced in the partitioned hash-join node to allow filters to be produced even when the PHJ spills. Filters are now produced during the level0 processing of the PHJ's build-side input in ProcessBuildBatch(). Since this function is codegen'ed, so now is filter production. We use constant-propagation via constant argument injection to disable filter production at no cost when it is not needed (including in level1+ repartitioning). I inspected the IR to confirm that the constant propagation works as expected. This change also allows us to send filters earlier during build-side processing. A tradeoff is that filters are still built even if the expected FP rate is too high, although any too-permissive filters are still not sent to the scan (see 'Performance impact' below). The restriction that prevented filters from being computed inside a sub-plan is removed as part of this cleanup (since the FE handles assigning filters correctly in subplans), and a test is added to confirm that one of the correct cases for filters in subplans works. This patch also fixes a bug where re-partitioning beyond level0 would not use the codegen'ed implementation of ProcessBuildBatch(). A new test is added to test_runtime_row_filters, for Parquet only, which spills and confirms that filtering still occurs. Finally, the legacy --enable_phj_probe_side_filtering / --enable_probe_side_filtering flags have been deprecated, as runtime filtering can be permanently disabled via setting RUNTIME_FILTER_MODE=OFF. The implementation that the old flags referred to has been removed. Performance impact ------------------ We benchmark the performance loss due to always computing runtime filters even when the FP-rate will turn out to be too high as follows: select STRAIGHT_JOIN count(*) from (select id from functional.alltypes LIMIT 1) a JOIN [BROADCAST] (select * FROM p LIMIT 100000000) b on a.id = -b.id and b.part_col > 0 ('p' is a two-column Parquet table with 1B rows). This builds a 100M row build table (benchmarks run on one node). When filtering is enabled, the filter is built but selects all rows from the probe side (so that there's no benefit to having the filter, to emphasise the cost of building the filter in the first place). RUNTIME_FILTER_MODE Avg. time (s) over 5 runs OFF 18.95 GLOBAL 19.55 ------------------------------- Change +3% Change-Id: I59a2d9ee03ccea6b674392584e4c7f272233571e --- M be/src/exec/blocking-join-node.cc M be/src/exec/blocking-join-node.h M be/src/exec/hash-join-node.cc M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test 7 files changed, 130 insertions(+), 122 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/83/2783/6 -- To view, visit http://gerrit.cloudera.org:8080/2783 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a2d9ee03ccea6b674392584e4c7f272233571e Gerrit-PatchSet: 6 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong