impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Date Thu, 12 Oct 2017 00:13:34 GMT
Tim Armstrong has posted comments on this change. ( )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu

Patch Set 6:

Commit Message:
PS6, Line 15: them into the Kudu client. Because the Kudu client doesn't provide a
Is there any documentation for Kudu about what ordering Kudu uses for comparisons for different
types? I guess we already push filters so we're already depending on the orderings being the
same as Impala, but I'd be interested to know. 

We had a similar discussion about Parquet and the spec got clarified a lot as a result (
now specifies it). Parquet-MR also had various odd behaviour that got shaken out as a result
of this.
File be/src/exec/filter-context.h:
PS6, Line 106: bloom_filter
has_bloom_filter() ? At callsites it reads like this should return a filter rather than a
PS6, Line 109: min_max_filter
File be/src/runtime/
PS6, Line 1220:       MinMaxFilter::Or(src_type(), params.min_max_filter, min_max_filter_.get());
I mentioned this in a later comment, but we should consider what happens if the min/max values
are very large - maybe thre's some memory management similar to above.
File be/src/runtime/
PS6, Line 235:   MinMaxFilter* min_max_filter = MinMaxFilter::Create(type, &obj_pool_);
(nit) - could combine into one line.
File be/src/runtime/
PS6, Line 27:   // to a) the atomicity of / pointer assignments and b) the x86 TSO memory
Comment not relevant any more since we're using actual atomics?
File be/src/util/
PS6, Line 53:       min_str = string(value->ptr, value->len);
We should think through the behaviour with very large strings. It may make sense to disable
the filters if the strings are too large instead of allocating arbitrarily large amounts of

Maybe we could also do some trickery with truncating the strings to avoid handling the filters
not existing. E.g. if we have a 4 byte limit on min/max, replace min="aaaaaaa" with min="aaaa"
and max="zzzzzz" with max="zzz(". Interested to hear what you think - might be too clever
but seems like it could work.

It would be good to also allocate tracked memory for the string. The Parquet ColumnStats class
solves a similar problem using StringBuffers that allocate from a MemPool.
File be/src/util/min-max-filter.h:
PS6, Line 83: #define NUMERIC_MIN_MAX_FILTER(NAME, TYPE)                         \
Yeah, the macro seems like it may be the least of the evils.
File be/src/util/
PS6, Line 77:   std::string NAME##MinMaxFilter::DebugString() const {                    
nit: don't need std:: prefix in .cc files.
PS6, Line 139:       out->min.__set_string_val(std::min(in.min.string_val, out->min.string_val));
I feel like we should use our StringValue::Compare() function instead of relying on std::string's
comparison function. I'm not sure if they are exactly the same but if we compare these the
same way as we compare StringValue's it's easier to reason about.
PS6, Line 291:       BigIntMinMaxFilter::Or(in, out);
Shouldn't this be calling the timestamp method?
PS6, Line 327:   }
File fe/src/main/java/org/apache/impala/planner/
PS6, Line 379:         if (slotRef == null || slotRef.getDesc().getColumn() == null
Are there planner tests for all these cases?
File testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test:
PS6, Line 30: kudu
Can we make this something like table_format=kudu so that it's a bit more self-describing
and future-proof?
File tests/query_test/
PS6, Line 61:       lambda v: v.get_value('table_format').file_format not in ['hbase', 'kudu'])
nit: 4 character indents on line continuations (I guess this is preserved from the existing
File tests/util/
PS6, Line 252:           allowed_formats = ['kudu']
Thanks for adding this validation, hopefully will lead people in the right direction if they
need to generalise this.
PS6, Line 254: hint
hint? Isn't it a table format?

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Michael Ho <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-Reviewer: Thomas Tauber-Marshall <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Comment-Date: Thu, 12 Oct 2017 00:13:34 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message