impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) IMPALA-3141: Send full filters when filter production is disabled
Date Tue, 15 Mar 2016 17:24:14 GMT
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3141: Send full filters when filter production is disabled

Patch Set 3:

File be/src/exec/blocking-join-node.h:

Line 97:   /// inside a subplan or the false-positive rate would be too high.
> we can have more than one filter per build side, each one with its own effe
File be/src/exec/

Line 1723:         if (filter->IsFull() || reject_ratio < FLAGS_parquet_min_filter_reject_ratio)
> full/invalid filters shouldn't be part of filter_ctxs_
Why not? It seems reasonable to me that a context should be able to answer questions about
the filter it contains. The alternative is to allow something like:


but I think that breaks abstraction further by exposing the inner implementation.

The idea of letting all values pass the filter is general over all implementations (e.g. a
filter on a < predicate where the minimum is -MAX_INT).
File be/src/exec/

Line 402:     // filter due to a spilled partition.
> is there a todo to separate spilling from filters?
There's a JIRA: IMPALA-3077

Line 521:   BOOST_FOREACH(const FilterContext& ctx, filters_) {
> range for?
Avoiding C++11 in this file which is, in part, cross-compiled.
File be/src/runtime/

Line 1992:     if (state->completion_time != 0) return;
> it make more sense to adjust the pending_count in that case to avoid confus
File be/src/runtime/

Line 106:       if (it == consumed_filters_.end()) return;
> how is this possible again? hasn't got registered yet?
This is a hangover from when filters could be both local and have the coordinator deliver
them (before the code explicitly distinguished local and global targets).

Line 113:       // TODO: Avoid need for this copy.
> this is for local targets. why is there a need for this copy?
ah, it's not needed after Michael's changes to the way filters are accounted for.

Line 120:       // Take lock only to ensure no race with PublishGlobalFilter() - there's no
need for
> since this is a local target, is PublishGlobalFilter even called? (the coor
No, it's not any more (the coordinator's not even involved - local targets aren't ever sent
to the coordinator).

Line 148:   if (it->second->filter_desc().is_broadcast_join && it->second->HasBloomFilter())
> would be nice to eliminate this code path in the coordinator as part of thi
Can be removed.
File be/src/runtime/runtime-filter.h:

Line 201:   bool IsFull() const { return HasBloomFilter() && bloom_filter_ == NULL;
> i find the 'full' terminology a bit problematic, because we also have Bloom
Sorry, I'm having some trouble parsing that first sentence - what's the relationship between
Insert() and IsFull() ?

In general I don't like 'invalid' so much because it suggests there's an unexpected error
(should the query stop when it gets an invalid filter?) where full filters are just a particular
instance of a filter and so logically shouldn't be treated any differently.
File be/src/util/

Line 31:       // are 64 = 2^6 bytes in a cache line.
> could you add a static_assert for that?
Not sure exactly what you want asserted - that 1 << LOG_BUCKET_WORD_BITS == 64?

Line 64: void BloomFilter::ToThrift(TBloomFilter* thrift) const {
> while you're fixing things: you should point out in the .h file that this i
I don't think it is destructive - the string constructor copies directory_, not the pointer.
File common/thrift/ImpalaInternalService.thrift:

Line 549:   4: required bool is_full
> instead of introducing a new concept (full), couldn't we phrase that as eff
I thought about that. My feeling was that encoding a full filter as one with size 0 is not
representationally consistent (does a size 0 vector have all its bits set, or none of them?),
and therefore requires just as much logic to deal with. as you need to special case directory.size()
== 0. In that case it's better, in my opinion, to explicitly record the fact that the filter
is full.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I04b3e6542651c1e7b77a9bab01d0e3d9506af42f
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <>
Gerrit-Reviewer: Henry Robinson <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-HasComments: Yes

View raw message