impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3141: Send full filters when filter production is disabled
Date Tue, 15 Mar 2016 17:24:14 GMT
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3141: Send full filters when filter production is disabled
......................................................................


Patch Set 3:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/exec/blocking-join-node.h
File be/src/exec/blocking-join-node.h:

Line 97:   /// inside a subplan or the false-positive rate would be too high.
> we can have more than one filter per build side, each one with its own effe
Done


http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 1723:         if (filter->IsFull() || reject_ratio < FLAGS_parquet_min_filter_reject_ratio)
{
> full/invalid filters shouldn't be part of filter_ctxs_
Why not? It seems reasonable to me that a context should be able to answer questions about
the filter it contains. The alternative is to allow something like:

  filter_ctx->filter()->IsFull()

but I think that breaks abstraction further by exposing the inner implementation.

The idea of letting all values pass the filter is general over all implementations (e.g. a
filter on a < predicate where the minimum is -MAX_INT).


http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 402:     // filter due to a spilled partition.
> is there a todo to separate spilling from filters?
There's a JIRA: IMPALA-3077


Line 521:   BOOST_FOREACH(const FilterContext& ctx, filters_) {
> range for?
Avoiding C++11 in this file which is, in part, cross-compiled.


http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 1992:     if (state->completion_time != 0) return;
> it make more sense to adjust the pending_count in that case to avoid confus
Done


http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/runtime/runtime-filter.cc
File be/src/runtime/runtime-filter.cc:

Line 106:       if (it == consumed_filters_.end()) return;
> how is this possible again? hasn't got registered yet?
This is a hangover from when filters could be both local and have the coordinator deliver
them (before the code explicitly distinguished local and global targets).


Line 113:       // TODO: Avoid need for this copy.
> this is for local targets. why is there a need for this copy?
ah, it's not needed after Michael's changes to the way filters are accounted for.


Line 120:       // Take lock only to ensure no race with PublishGlobalFilter() - there's no
need for
> since this is a local target, is PublishGlobalFilter even called? (the coor
No, it's not any more (the coordinator's not even involved - local targets aren't ever sent
to the coordinator).


Line 148:   if (it->second->filter_desc().is_broadcast_join && it->second->HasBloomFilter())
{
> would be nice to eliminate this code path in the coordinator as part of thi
Can be removed.


http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/runtime/runtime-filter.h
File be/src/runtime/runtime-filter.h:

Line 201:   bool IsFull() const { return HasBloomFilter() && bloom_filter_ == NULL;
}
> i find the 'full' terminology a bit problematic, because we also have Bloom
Sorry, I'm having some trouble parsing that first sentence - what's the relationship between
Insert() and IsFull() ?

In general I don't like 'invalid' so much because it suggests there's an unexpected error
(should the query stop when it gets an invalid filter?) where full filters are just a particular
instance of a filter and so logically shouldn't be treated any differently.


http://gerrit.cloudera.org:8080/#/c/2475/3/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

Line 31:       // are 64 = 2^6 bytes in a cache line.
> could you add a static_assert for that?
Not sure exactly what you want asserted - that 1 << LOG_BUCKET_WORD_BITS == 64?


Line 64: void BloomFilter::ToThrift(TBloomFilter* thrift) const {
> while you're fixing things: you should point out in the .h file that this i
I don't think it is destructive - the string constructor copies directory_, not the pointer.


http://gerrit.cloudera.org:8080/#/c/2475/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 549:   4: required bool is_full
> instead of introducing a new concept (full), couldn't we phrase that as eff
I thought about that. My feeling was that encoding a full filter as one with size 0 is not
representationally consistent (does a size 0 vector have all its bits set, or none of them?),
and therefore requires just as much logic to deal with. as you need to special case directory.size()
== 0. In that case it's better, in my opinion, to explicitly record the fact that the filter
is full.


-- 
To view, visit http://gerrit.cloudera.org:8080/2475
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I04b3e6542651c1e7b77a9bab01d0e3d9506af42f
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message