impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zach Amsden (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries
Date Mon, 22 May 2017 20:33:37 GMT
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries

Patch Set 7:


Thanks for looking at this!
Commit Message:

Line 7: IMPALA-4864 Speed up single slot predicates with dictionaries
> The JIRA seems to be talking about doing something slightly different: taki
There is a bit of a disconnect with the JIRA as specified and the intention behind the JIRA,
which had me initially working down the wrong path.  I realized that (x OP k) type predicates
can only be evaluated for OP = or <=>, and started working on a way to identify such
predicates in sub-expressions of the conjuncts.  It all seemed quite speculative and limited
in applicability.  Once can imagine expanding this to ordering predicates with sorted dictionaries,
but as we don't have that yet, it would be building without a foundation.

After clarification from Marcel, it became apparent that what was intended was to be able
to evaluate arbitrary single slot predicates against the dictionary (as we do this anyway
for row-group filtering) and save those results to eliminate the cost of re-evaluating the
same conjuncts again later.

The central difficulty of this problem is that the set of conjuncts which can be dictionary
evaluated is not known until row group time.

This is the best proposal that I have come up with so far at attempting to solve that problem.

I agree we should get the design right on this one.  Doing this wrong could be costly in terms
of either performance or complexity.
File be/src/exec/

Line 40
> I believe this code was deliberately written like this to avoid the indirec
I have no problem reverting this.  Originally I had the two functions merged into one, but
I didn't like imposing the overhead of checking the bitmap on this version.  While the code
evolved, I had to put an interface on ScratchBatch just for sanity, as too many pieces of
code were playing with its internals.

I think the interface is actually broad enough now to go back to the original code (minus
the interference with internals).  I'll simply do this at the end:

scratch_batch_->Advance((scratch_tuple - scratch_tuple_start) / tuple_size);

Line 77:        tuple_index = scratch_batch_->NextValidTuple()) {
> It seems like this might be slow for the case when many or all of the tuple
Your intuition and mine are similar.  I did my best to make the "simple" version of NextSetBit()
as fast as possible, but I believe intrinsics might actually be warranted here.

Wanted to get the core ideas sound before going in that direction though.

Oh, and for reference, I also tried various other bitmap implementations - boost::dynamic_bitset
and std::vector<bool>.  I didn't like the boost dependency (or the code too much) and
the std::vector<bool> (specialized bit-vector) implementation compiled to assembler
sent me running for the hills.

Branches, branches everywhere...
File be/src/exec/

Line 935
> I'm not sure, but this hoisting may have been deliberate because the compil
It certainly shouldn't hurt to re-hoist

Line 208:   dict_filters_active_.reset(new bool[scanner_conjunct_ctxs_->size()]);
> This probably works fine in practice but afaik scoped_ptr will call free in
Will check

PS7, Line 1051: _
> nit: extraneous _
Originally a class member, until I realized this function had no access.  Will fix.
File be/src/exec/

Line 317:   bool ReadValueBatch(ScratchTupleBatch& scratch, int* num_values) {
> I think it would be best to avoid using the ScratchTupleBatch abstraction i
My guess is that inlining the functions from the header should get about as optimal as possible.

Maybe hoisting 'max = scratch.capacity()' to tell the compiler it is unchanged throughout
the loop, but other than that, I don't see much room for improvement that has been lost. 
Let's see what perf results look like when that is ready.

I don't want to pass the bitmap directly, as I only want the bitmap to be touched (or even
known about) by those pieces of code which actually are running in a filtered column reader.

Line 408:           if (IS_FILTERED && IS_DICT_ENCODED &&
> This patch pushes the predicate evaluation down into ReadFilteredSlot() so 
Perhaps there is a way to do this, but I don't see a clear path to that.  We have some choices,
but they are all problematic:

Materialize codewords and values.  Where do the codewords go?  There is no space in the tuple.

Materialize codewords only.  What do we do when there is no dictionary?

Materialize codewords and values conditionally.  Now we have to write code to deal with either
of the two - and to go back inside the dictionary and convert codewords back into values,
while the entire functionality to do that was hidden inside the private details of the dict<T>
decoder and not exposed to the scanner at all.

In any case, doing predicate evaluation columnwise is certainly interesting (SIMD combined
with a lazy value lookup cache seems especially promising), but it completely wreaks havoc
with expression evaluation order, and I think dealing with the ramifications of that is probably
beyond the scope of this diff.

Here, we are just trying to make use of our pre-computed dictionary filter results as efficiently
as possible.

PS7, Line 408: IS_FILTERED
> I think IS_FILTERED could just be a function template parameter similar to 
The behavior is different as a class, not just at this function level, so I don't see how
that would work.

Line 417:           if (IS_FILTERED) batch.InvalidateTuple(val_index);
> This and setting the null indicator can be mutually exclusive,  right?  Sin
True.  (Was not so in original code).

There's actually a bug here in our decoding, but I don't want to open that right now.  We
don't throw any kind of error on getting a NULL decoding for a non-NULLable column.

PS7, Line 468:  !column_has_match_
> Maybe rename the member variable to match the function name. I think the cu

Line 609:   bool true_on_null_;
> Unused? It does seem like we need this for correctness in some cases, right
I tried to do some optimization based on this and allowed passing in predicates that were
true on null in the last diff, but it looked very speculative and ended up being a lot of
work and complexity.  I'll get rid of this.
File be/src/exec/parquet-column-readers.h:

Line 24: #include "exec/parquet-scratch-tuple-batch.h"
> I think we only need a forward declaration in this header - not a big deal 
Will do the bare minimum I can get away with.

Unfortunately very familiar with the compile time problem.

PS7, Line 374: aller must pass a tuple that can be used to
             :   // materialize temporary values from the dictionary.
> I think this needs updating - there's a callsite that passes in  nullptr.

Line 391:   // Returns true if the row group has no columns which pass filtering conjuncts,
> The function name and comment aren't too enlightening for me.
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 115:   Bitmap valid_bitmap_;
> We should consider if it's better to use a bitmap or an array of bools. The
I think our bitmap implementation is quite good - if we manage to get this compiled down to
a `testb M, bit`, we should be in good shape.

Again lets wait for benchmarks to tell us the answer.
File testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test:

Line 25:    parquet dictionary predicates: int_col IS NULL, int_col > 1
Oops, this needs reverting since I removed support for true on NULL predicates.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <>
Gerrit-Reviewer: Joe McDonnell <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Reviewer: Zach Amsden <>
Gerrit-HasComments: Yes

View raw message