impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3077: Enable runtime filters when PHJ spills
Date Sat, 16 Apr 2016 00:32:16 GMT
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/2783

to look at the new patch set (#5).

Change subject: IMPALA-3077: Enable runtime filters when PHJ spills
......................................................................

IMPALA-3077: Enable runtime filters when PHJ spills

This patch changes when runtime filters are produced in the partitioned
hash-join node to allow filters to be produced even when the PHJ
spills. Filters are now produced during the level0 processing of the
PHJ's build-side input in ProcessBuildBatch().

Since this function is codegen'ed, so now is filter production. We use
constant-propagation via constant argument injection to disable filter
production at no cost when it is not needed (including in level1+
repartitioning). I inspected the IR to confirm that the constant
propagation works as expected.

This change also allows us to send filters earlier during build-side
processing. A tradeoff is that filters are still built even if the
expected FP rate is too high, although any too-permissive filters are
still not sent to the scan.

The restriction that prevented filters from being computed inside a
sub-plan is removed as part of this cleanup (since the FE handles
assigning filters correctly in subplans), and a test is added to confirm
that one of the correct cases for filters in subplans works.

This patch also fixes a bug where re-partitioning beyond level0 would
not use the codegen'ed implementation of ProcessBuildBatch().

A new test is added to test_runtime_row_filters, for Parquet only, which
spills and confirms that filtering still occurs.

Change-Id: I59a2d9ee03ccea6b674392584e4c7f272233571e
---
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
7 files changed, 125 insertions(+), 121 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/83/2783/5
-- 
To view, visit http://gerrit.cloudera.org:8080/2783
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a2d9ee03ccea6b674392584e4c7f272233571e
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message