impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) IMPALA-3077: Enable runtime filters when PHJ spills
Date Tue, 19 Apr 2016 17:13:51 GMT
Hello Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change.  Please visit

to look at the new patch set (#7).

Change subject: IMPALA-3077: Enable runtime filters when PHJ spills

IMPALA-3077: Enable runtime filters when PHJ spills

This patch changes when runtime filters are produced in the partitioned
hash-join node to allow filters to be produced even when the PHJ
spills. Filters are now produced during the level0 processing of the
PHJ's build-side input in ProcessBuildBatch().

Since this function is codegen'ed, so now is filter production. We use
constant-propagation via constant argument injection to disable filter
production at no cost when it is not needed (including in level1+
repartitioning). I inspected the IR to confirm that the constant
propagation works as expected.

This change also allows us to send filters earlier during build-side
processing. A tradeoff is that filters are still built even if the
expected FP rate is too high, although any too-permissive filters are
still not sent to the scan (see 'Performance impact' below).

The restriction that prevented filters from being computed inside a
sub-plan is removed as part of this cleanup (since the FE handles
assigning filters correctly in subplans), and a test is added to confirm
that one of the correct cases for filters in subplans works.

This patch also fixes a bug where re-partitioning beyond level0 would
not use the codegen'ed implementation of ProcessBuildBatch().

A new test is added to test_runtime_row_filters, for Parquet only, which
spills and confirms that filtering still occurs.

Finally, the legacy --enable_phj_probe_side_filtering /
--enable_probe_side_filtering flags have been deprecated, as runtime
filtering can be permanently disabled via setting
RUNTIME_FILTER_MODE=OFF. The implementation that the old flags referred
to has been removed.

Performance impact

We benchmark the performance loss due to always computing runtime
filters even when the FP-rate will turn out to be too high as follows:

select STRAIGHT_JOIN count(*) from (select id from functional.alltypes
LIMIT 1) a JOIN [BROADCAST] (select * FROM p LIMIT 100000000) b on
= and b.part_col > 0

('p' is a two-column Parquet table with 1B rows).

This builds a 100M row build table (benchmarks run on one node). When
filtering is enabled, the filter is built but selects all rows from the
probe side (so that there's no benefit to having the filter, to
emphasise the cost of building the filter in the first place).

RUNTIME_FILTER_MODE    Avg. time (s) over 5 runs
OFF                    18.95
GLOBAL                 19.55
Change                 +3%

Change-Id: I59a2d9ee03ccea6b674392584e4c7f272233571e
M be/src/exec/
M be/src/exec/blocking-join-node.h
M be/src/exec/
M be/src/exec/
M be/src/exec/
M be/src/exec/partitioned-hash-join-node.h
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
7 files changed, 130 insertions(+), 122 deletions(-)

  git pull ssh:// refs/changes/83/2783/7
To view, visit
To unsubscribe, visit

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I59a2d9ee03ccea6b674392584e4c7f272233571e
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Henry Robinson <>
Gerrit-Reviewer: Tim Armstrong <>

View raw message