impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3141: Send dummy filters when filter production is disabled
Date Wed, 23 Mar 2016 22:15:10 GMT
Hello Marcel Kornacker, Internal Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/2475

to look at the new patch set (#11).

Change subject: IMPALA-3141: Send dummy filters when filter production is disabled
......................................................................

IMPALA-3141: Send dummy filters when filter production is disabled

The PHJ may disable runtime filter production for one of several
reasons, including a predicted high false-positive rate. If the filters
are not produced, any scans will wait for their entire timeout before
continuing.

This patch changes the filter logic to always send a filter, even if one
wasn't actually produced by the PHJ. To preserve correctness, that
filter must contain every element of the set. Such a filter is
represented by (BloomFilter*)NULL. This allows us to make no changes to
RuntimeFilter::Eval(), which already returns true if the member Bloom
filter is NULL.

In RPCs, a new field is added to TBloomFilter to identify filters that
are always true.

The HdfsParquetScanner checks to see if filters would always return true
for any element, and disables them if so.

There is some miscellaneous cleanup in this patch, particularly the
removal of unused members in BloomFilter.

This patch has been manually tested on queries that would otherwise take
a long time to time-out. A unit test was added to ensure that queries do
not wait.

Change-Id: I04b3e6542651c1e7b77a9bab01d0e3d9506af42f
---
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/cpu-info.cc
M be/src/util/cpu-info.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/com/cloudera/impala/planner/HashJoinNode.java
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters_wait.test
21 files changed, 226 insertions(+), 178 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/75/2475/11
-- 
To view, visit http://gerrit.cloudera.org:8080/2475
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I04b3e6542651c1e7b77a9bab01d0e3d9506af42f
Gerrit-PatchSet: 11
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>

Mime
View raw message