impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner
Date Thu, 17 Nov 2016 00:55:51 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-2494: Support for byte array-encoded decimals in Parquet scanner

Patch Set 1:

Commit Message:

Line 21:  * Tested computing SUM(col) for 1 billion distinct dictionary-encoded
> Can do - but where would that extra slowness really come from? I would have
I'm assuming you are measuring response time. Since there is overall more work for the scanner
to do in your dict-encoded   experiment, any difference in perf will be less pronounced because
it affects a relatively smaller portion of the work. With plain encoded there is no "overhead"
of decoding the dictionary indexes and fetching the values from the dictionary. For a single
decimal column, the work of decoding the dict indexes and fetching their values should be
in the same ball park as just populating the slot directly with plain encoding, so there is
roughly 50% "noise" it seems.

Line 23:  * No performance difference measured by introduction of extra
> No, but I can do. What do you expect to change?
I'm assuming you compared response times. With multi-threaded scans the loss in perf might
not be apparent.

With mt_dop=1 we're running the whole query in a single thread, so any slowdown along that
critical path should prominently affect response time.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: If95171e65aa48f08b08b8e87f4555dc75e867977
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Henry Robinson <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message