impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Date Thu, 25 May 2017 07:59:53 GMT
Hello Marcel Kornacker, Impala Public Jenkins, Jim Apple, Alex Behm,

I'd like you to reexamine a change.  Please visit

to look at the new patch set (#18).

Change subject: IMPALA-5347: Parquet scanner microoptimizations

IMPALA-5347: Parquet scanner microoptimizations

A mix of microoptimizations that profiling the parquet scanner revealed.
All lead to some measurable improvement and added up to significant
speedups for various scans.

* Add ALWAYS_INLINE to hot functions that GCC was mistakenly not inlining
  in all cases.
* Apply __restrict__ in a few places so the compiler knows that it is
  safe to cache values accessed via those pointers
* memset() the whole batch instead of the null indicators is cases where
  it is almost certainly cheaper.
* Avoid updating two correlated loop variables in MaterializeValueBatch().
* Avoid unnecessary initialization of often-unused 'val' in ReadSlot().
* Shave a few instructions off the (still very expensive) bit unpacking
  and dict decoding logic.


Some local TPC-H and targeted-perf benchmarks showed average speedups of

I did some benchmarks targeted at measuring column materialisation
performance using a version of lineitem with duplicated data to make
it bigger. These queries all got significantly faster.

Dict-encoded DECIMAL: 2.23 -> 1.23s

  SELECT count(*) FROM biglineitem WHERE l_quantity > 49

Plain-encoded BIGINT: 2.33s -> 1.62s

  SELECT count(*) FROM biglineitem WHERE l_orderkey != 10

Dict-encoded STRING: 2.73s -> 1.72s

  SELECT count(*) FROM biglineitem WHERE l_returnflag = 'A'

Plain-encoded STRING: 7.07s -> 6.08s (most time spent in Snappy)
  SELECT count(*) FROM biglineitem WHERE length(l_comment) > 37

Multiple columns: 5.15s -> 3.74s

  SELECT count(*) FROM biglineitem
  WHERE l_quantity > 49 and l_partkey != 199 and l_tax < 100

Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
M be/src/common/compiler-util.h
M be/src/exec/
M be/src/exec/hdfs-scanner.h
M be/src/exec/
M be/src/runtime/tuple.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/dict-encoding.h
M be/src/util/rle-encoding.h
9 files changed, 114 insertions(+), 47 deletions(-)

  git pull ssh:// refs/changes/50/6950/18
To view, visit
To unsubscribe, visit

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Reviewer: anujphadke <>

View raw message