impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zach Amsden (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries
Date Mon, 05 Jun 2017 18:55:04 GMT
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries
......................................................................


Patch Set 16:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/hdfs-parquet-scanner-ir.cc
File be/src/exec/hdfs-parquet-scanner-ir.cc:

Line 89:     idx = scratch_batch_.NextValidTuple(idx);
Surprising, despite apparently being costlier, the bitmap scan appears to be fast, especially
when predicates are selective.


http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 420:               LIKELY(dictionary_results_.num_bits() > 0)) {
Not liking this branch, but it is unavoidable without pre-computation overhead.  When we have
a partially dictionary encoding on a filtered column that spills to plain encoding, we don't
pre-evaluate the dictionary against predicates.  We could do so anyway, but it may end up
being an unnecessary up-front cost.  However, the number of dictionary entries before spilling
to plain is bounded, so the cost should be fixed.

Maybe we can afford to do so if the predicates are "inexpensive", but then we are still left
with this check for expensive predicate (and have no way to determine cost here without another
check).


http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/exec/parquet-scratch-tuple-batch.h
File be/src/exec/parquet-scratch-tuple-batch.h:

Line 172:     // pointer.  We don't need to transfer the extra filter byte.
Comment obsolete with latest diff.


http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

Line 94:   NullIndicatorOffset(int byte_offset = 0, int bit_offset = -1)
This fixes an apparent bug, where byte_offset = -1 (passed in constructor for ParquetColumnReader
as NullIndicatorOffset(-1, -1) ends up converting into code which will execute something like:

tuple_mem[-1] |= 0

Which despite not modifying memory, could be an out of bounds memory access.


http://gerrit.cloudera.org:8080/#/c/6726/16/be/src/util/bitmap-test.cc
File be/src/util/bitmap-test.cc:

Line 81:   for (index = bm.NextSetBit(); index < size; index = bm.NextSetBit(index)) {
test needs updating for behavior change: index -> index + 1


-- 
To view, visit http://gerrit.cloudera.org:8080/6726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <zamsden@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zamsden@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message