impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3007: Adjust Bloom Filter size according to NDV estimate
Date Thu, 28 Apr 2016 23:59:56 GMT
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3007: Adjust Bloom Filter size according to NDV estimate
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/2812/4/be/src/exec/old-hash-table.cc
File be/src/exec/old-hash-table.cc:

Line 143:     int64_t filter_size = state_->filter_bank()->GetFilterSizeForNdv(
> i think it's best to compute the size during initialization of the filter c
Done


http://gerrit.cloudera.org:8080/#/c/2812/4/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 522:         ctx.filter->filter_desc().ndv_estimate);
> why not record the actual size in the filter context. that way, you can't h
It's a bit more convenient to put it in the RuntimeFilter itself (that way the RuntimeFilterBank
can set it without the caller having to do so).


Line 524:     bool fp_rate_too_high =
> don't we check this periodically, or is that only done in the parquet scann
Only in the parquet scanner. Not clear if there'd be an advantage to doing it periodically
here as doing so would add a branch back to the filter building path.


http://gerrit.cloudera.org:8080/#/c/2812/4/fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/DistributedPlanner.java:

Line 518:       // estimates may have changed as well.
> looking at this again, i'm not sure i follow that argument. in what way hav
After further digging in, I think this is a bug. With a join-node that has a union as build
input and a limit on the union, the cardinality estimate after single-node planning (when
the filters are originally generated) is incorrect. It does not apply the limit.

After fragmentation (and the insertion of an xchg into the build input) the cardinality corrects
itself. 

See IMPALA-3450 for more details.


http://gerrit.cloudera.org:8080/#/c/2812/4/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test
File testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test:

Line 6: # consumption / spilling behaviour.
> move that last sentence to the very top, as a separate comment, because it 
Not sure exactly where you mean: this is the only query this applies to. (The nested-types
query below is here because the old HJ implementation doesn't work with nested-types).


Line 26: 
> reign in the blank lines
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/2812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1fe37b8d4cfb3c52bb8e8cf0ca55e92665b87803
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message