impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) IMPALA-3007: Adjust Bloom Filter size according to NDV estimate
Date Thu, 28 Apr 2016 03:40:58 GMT
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3007: Adjust Bloom Filter size according to NDV estimate

Patch Set 1:

File be/src/exec/

Line 230:   hash_tbl_->AddBloomFilters();
> huh?
Could you be a bit more descriptive? :)

All the logic for checking FP rates has gone into AddBloomFilters().
File be/src/exec/

Line 156:     uint32_t log_space = state->filter_bank()->GetLogSpaceForNdv(filter.ndv_estimate);
> why not have a GetFilterByteSize() or something like that. the scan node sh
File be/src/exec/

Line 148:           filters_[i]->filter_desc().filter_id);
> this takes the capacity, not an id.
File be/src/exec/

Line 500:         state->filter_bank()->FpRateTooHigh(ndv_estimate, total_build_rows);
> just as an aside: instead of looking at build rows, which is indirect, why 
File be/src/runtime/

Line 171:   uint64_t required_space =
> let's not use unsigned ints
File be/src/runtime/runtime-filter.h:

Line 77:   /// expected false-positive rate would be larger than allowed by
> "a filter's expected false-positive rate would exceed flags_max_filter_erro

Line 79:   bool FpRateTooHigh(uint64_t expected_ndv, uint64_t observed_ndv);
> role of expected_ndv unclear

Line 94:   BloomFilter* AllocateScratchBloomFilter(int64_t ndv_estimate);
> instead of continuing to talk about ndv and estimates, which are fe concept
Why do you feel the NDV is a FE-only concept? In my opinion it's the key parameter to determining
the size of the BF.
File fe/src/main/java/com/cloudera/impala/planner/

Line 415:         filter.computeNdvEstimate();
> this also needs to happen for repartitioning joins
Done (as a result of the previous patch which refactored this method).
File fe/src/main/java/com/cloudera/impala/planner/

Line 113:     // Estimate of the number of distinct values that will be inserted into this
> explain meaning of -1
Done. Even for repartitioning joins we want the total NDV across all instances, since the
filters will be merged.
File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test:

Line 253: # Test case 11: filters with high expected FP rate get disabled.
> what does "expected" mean here?
That the rate of false-positives when probing the filter is expected to be high. The actual
FP-rate is irrelevant.
File testdata/workloads/functional-query/queries/QueryTest/runtime_filters_wait.test:

Line 37: row_regex: .*0 of 1 Runtime Filters Produced.*
> i'm not sure about this error message, it makes it sound like something wen
What would you prefer? In a certain sense, something did go wrong (we guessed the right size
for a filter, and it was enough of an underestimate that the filter was disabled).

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I1fe37b8d4cfb3c52bb8e8cf0ca55e92665b87803
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <>
Gerrit-Reviewer: Henry Robinson <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-HasComments: Yes

View raw message