impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Date Wed, 01 Nov 2017 20:53:08 GMT
Tim Armstrong has posted comments on this change. ( )

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

Patch Set 7:


I think we'll need to do some more work on testing for the int32/64 patches - we don't have
pre-existing test files from parquet-mr. I think we'll have to generate some more test files
with parquet-mr for the other cases, and we could consider turning that code into a data generator
to generate more test files. From what I could tell Hive doesn't have a knob to generate some
of these alternative output encodings.

 I feel ok with the coverage since we have end-to-end tests then more exhaustive unit tests
for the various ways of encoding it.
File be/src/exec/parquet-common.h:
PS7, Line 391: fixed_len_size
Looked again. The variable name (and recycling the argument storage) is confusing. Maybe 'encoded_byte_size'?
File be/src/exec/
PS7, Line 33: int EncodeVarLenDecimal(const DECIMAL_TYPE& t, int fixed_len_size, uint8_t*
I took another look at the standard and it says that the minimum number of bytes required
to store the unscaled value should be used:

I think this means that we should not be including any preceding "0" bytes. I.e. we should
not have a fixed_len_size argument and instead determine the size based on the number of leading
zero bytes in the value.

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 7
Gerrit-Owner: Bikramjeet Vig <>
Gerrit-Reviewer: Bikramjeet Vig <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:53:08 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message