impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding
Date Thu, 19 Oct 2017 20:36:44 GMT
Hello Lars Volker, 

I'd like you to reexamine a change. Please visit

to look at the new patch set (#9).

Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding

IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding

Switch the decoders to using more batch-oriented interfaces. As an
intermediate step this doesn't make the interfaces of LevelDecoder
or DictDecoder batch-oriented, only the lower-level utility classes.

The next step would be to change those interfaces to be batch-oriented
and make according optimisations in parquet. This could deliver much
larger perf improvements than the current patch.

The high-level changes are.
* BitReader -> BatchedBitReader, which is built to unpack runs of 32
  bit-packed values efficiently.
* RleDecoder -> RleBatchDecoder, which exposes the repeated and literal
  runs to the caller and uses BatchedBitReader to unpack literal runs
* Dict decoding uses RleBatchDecoder to decode repeated runs efficiently
  and uses the BitPacking utilities to unpack and encode in a single

Also removes an older benchmark that isn't too interesting (since
the batch-oriented approach to encoding and decoding is so much
faster than the value-by-value approach).

* Ran core tests.
* Updated unit tests to exercise new code.
* Added test coverage for the deprecated bit-packed level encoding to
  that it still works (there was no coverage previously).

Single-node benchmarks showed a few % performance gain. 16 node cluster
benchmarks only showed a gain for TPC-H nested.

Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6
M be/src/benchmarks/CMakeLists.txt
M be/src/benchmarks/
D be/src/benchmarks/
M be/src/exec/
M be/src/exec/parquet-column-readers.h
D be/src/experiments/bit-stream-utils.8byte.h
D be/src/experiments/bit-stream-utils.8byte.inline.h
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/dict-encoding.h
M be/src/util/
M be/src/util/
M be/src/util/rle-encoding.h
M be/src/util/
M testdata/data/README
A testdata/data/alltypes_agg_bitpacked_def_levels.parquet
A testdata/workloads/functional-query/queries/QueryTest/parquet-def-levels.test
M tests/query_test/
20 files changed, 1,201 insertions(+), 946 deletions(-)

  git pull ssh:// refs/changes/67/8267/9
To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6
Gerrit-Change-Number: 8267
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Tim Armstrong <>

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message