impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries
Date Mon, 22 May 2017 18:54:23 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries

Patch Set 7:


Looks promising. We're likely to be building more things on top of this, and the potential
impact is pretty big if we made the right design decisions, so I want to make sure that we're
making the right choices here - I think there are a few decision points we need to think through
Commit Message:

Line 7: IMPALA-4864 Speed up single slot predicates with dictionaries
The JIRA seems to be talking about doing something slightly different: taking predicates of
form (x OP constant), translating the constant into the codeword, then comparing dict_index
directly to the codeword. That avoids any kind of dictionary or bitmap lookup and can be vectorized
so can be very effective. There are various papers about this kind of trick - I can find links
if you haven't seen them. I think maybe we need to split out this work into a separate JIRA.

Your current approach applies to a much larger range of predicates so is also very useful.
It would be good to think about whether the infrastructure will support that kind of optimisation
in the future. I think doing a more column-oriented approach might make it more natural to
do multiple passes over the column.

Line 41: function per file format, so we would have to simulate a file format
Yeah that codegen'd function map is a bit of a mess.
File be/src/exec/

Line 40
I believe this code was deliberately written like this to avoid the indirection via this->scratch_batch_
in the tight loop below - I think we should leave this alone.

Line 77:        tuple_index = scratch_batch_->NextValidTuple()) {
It seems like this might be slow for the case when many or all of the tuples are valid, since
NextSetBit() is fairly expensive compared to incrementing a counter. It's probably worth waiting
to see performance numbers, but my intuition is that we'll need to make sure that advancing
to the next bit is cheaper.
File be/src/exec/

Line 935
I'm not sure, but this hoisting may have been deliberate because the compiler couldn't hoist
the load via scratch_batch_ out of the loop

Line 208:   dict_filters_active_.reset(new bool[scanner_conjunct_ctxs_->size()]);
This probably works fine in practice but afaik scoped_ptr will call free instead of free[]
on the array. I think unique_ptr<bool[]> does the right thing.

Line 865:     // Legacy impala files cannot be eliminated here, because the only way to
I think we're missing an opportunity to apply the optimisation to columns with a mix of plain
and dictionary-encoded pages.

My understanding is that the pre-existing dictionary filtering optimisation only works when
the whole column is dictionary-encoded, but your new optimisation can be applied on a page-by-page

I'm not sure how hard it is to restructure things to get it to work in that case. I think
it's probably fairly low-impact because most columns will either be predominantly plain-encoded
or predominantly dict-encoded, but we should add a comment so that the limitation is explicit.

PS7, Line 1051: _
nit: extraneous _
File be/src/exec/hdfs-parquet-scanner.h:

Line 436:   /// XXX Why is this a scoped_ptr ?
IIRC so we don't need to include the full header.
File be/src/exec/

Line 317:   bool ReadValueBatch(ScratchTupleBatch& scratch, int* num_values) {
I think it would be best to avoid using the ScratchTupleBatch abstraction in this code - these
functions are perf critical and it's easier to reason about performance when the abstraction
is stripped away and the code is directly operating over contiguous arrays as much as possible.

Maybe just pass in the bitmap directly?

Line 408:           if (IS_FILTERED && IS_DICT_ENCODED &&
This patch pushes the predicate evaluation down into ReadFilteredSlot() so that it's done
value-by-value as they are materialised.

I'm not convinced this is the best approach - I think we should consider peeling predicate
evaluation out into a separate loop and doing it columnwise. This would mean that ReadSlot()
or some variant would materialize the values and the codewords into an array, then we'd do
another pass over the codeword array to evaluate the conjunctions. Or maybe materialize the
codewords in one pass, then doing another pass to evaluate conjuncts, then another one to
lookup the surviving values in the dictionary. I think there are lots of possible permutations.

This would give tighter loops and more scope for future optimisations like SIMD and should
also avoid the need to add another template parameter to the column readers.

PS7, Line 408: IS_FILTERED
I think IS_FILTERED could just be a function template parameter similar to IS_DICT_ENCODED

Line 417:           if (IS_FILTERED) batch.InvalidateTuple(val_index);
This and setting the null indicator can be mutually exclusive,  right?  Since the null indicator
doesn't mean anything in that case.

PS7, Line 468:  !column_has_match_
Maybe rename the member variable to match the function name. I think the current name is slightly
confusing because it's not clear what it means if the column has some plain-encoded data.

Line 609:   bool true_on_null_;
Unused? It does seem like we need this for correctness in some cases, right?
File be/src/exec/parquet-column-readers.h:

Line 24: #include "exec/parquet-scratch-tuple-batch.h"
I think we only need a forward declaration in this header - not a big deal but compile times
start getting worse if we're not vigilant.

PS7, Line 374: aller must pass a tuple that can be used to
             :   // materialize temporary values from the dictionary.
I think this needs updating - there's a callsite that passes in  nullptr.

Line 391:   // Returns true if the row group has no columns which pass filtering conjuncts,
The function name and comment aren't too enlightening for me.

"Returns true if this column has no values that pass conjunctions. This can be determined
in some cases based on the dictionary values."

Maybe something like RowGroupFailsConjuncts() or AllValuesFailConjuncts().
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 115:   Bitmap valid_bitmap_;
We should consider if it's better to use a bitmap or an array of bools. The bitmap has better
cache density, but will require more operations to read and write each value. We're only going
to have 1024 values typically, so it might be worth wasting cache to save instructions.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <>
Gerrit-Reviewer: Joe McDonnell <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message