impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikramjeet Vig (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Date Thu, 19 Oct 2017 20:34:56 GMT
Bikramjeet Vig has posted comments on this change. ( )

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

Patch Set 4:

File be/src/exec/
PS4, Line 1277:         switch (node.element->type) {
> This case is only going to get bigger with the follow on patches - it would
File be/src/exec/parquet-common.h:
PS4, Line 376: inline int ParquetPlainEncoder::Decode<Decimal4Value, parquet::Type::BYTE_ARRAY>(
> I think you can avoid stamping this out if you leave it still parameterized
Unfortunately, the call to Decode is explicit in our code, eg Decode<T,S> therefore
it will always look for a specialization of those template parameters.
Similarly, in the dummy program you wrote, if I explicitly call foo<int, double>(&a,
&x), this will look for a specialization of the double templated  method and return 0;

Since partial specialization is not possible, the only way to reduce redundant code will be
to do what Lars suggested earlier, i.e., create a helper method that is only templatized on
<typename T> and call that for each full specialization.
I will make changes accordingly in the next iteration of this patch.
File be/src/exec/parquet-common.h:
PS2, Line 237: 
> I think this particular call here will always return sizeof(int32_t) (line 
should we do that everywhere else in this file wherever we specialize a method(both Decode
and encode)?
PS2, Line 338:   return fixed_len_size;
> I'm not sure I'm following: it looks like the next three methods are exactl
Oh I see what you mean now. Done!
File be/src/exec/
PS4, Line 66: bool IsSupportedPhysicalType(PrimitiveType impala_type,
> let's make it static - we don't want to export the symbol to be linked with
Done. enclosed in an anonymous namespace.
PS4, Line 70: (
> unnecessary parens
PS4, Line 173:     // TODO: Is this check too strict?
> I don't see why this shouldn't match the file metadata, this seems valid to
PS4, Line 182:  parquet::SchemaElement*
> Why not pass by const ref like above?
PS4, Line 209:         stringstream ss;
> Can you convert to using Substitute() while we're here? We've been very gra
File be/src/exec/
PS4, Line 176:   TestType<Decimal4Value, parquet::Type::BYTE_ARRAY>(Decimal4Value(test_val),
> It would be nice to combine the FIXED_LEN_BYTE_ARRAY and BYTE_ARRAY tests. 
PS4, Line 193: int32_t
> Any reason to use sizeof(int..) instead of sizeof(Decimal*Value) like above
for var len decimals, its encoded size is bytes required to store size + least num of bytes
required to store num.

in this case bytes required to store numeric_limits<int32_t>::max() is sizeof(int32_t)
which is less than the sizeof(Decimal8Value) since Decimal8Value is stored using int64

I will add a comment at the top of these byte_array test to clarify what expected size is.
would you recommend I add more comments before limit tests?
PS4, Line 212:   for (int i = 1; i <=16; ++i) {
> nit: missing space
File be/src/util/dict-encoding.h:
> Isn't this PHYSICAL_TYPE the internal type, which is called LOGICAL_TYPE in
reverted to 'T' to minimize change from previous code
> It doesn't seem like the whole decoding class needs to be templated by this
PS4, Line 323: DictDecoder<Decimal16Value, parquet::Type::BYTE_ARRAY>::GetNextValue(
> The logic here is also the same as above - we should find a way to avoid th
removed, as it is no longer needed since the template param is removed and moved to Reset()
File testdata/data/binary_decimal_dictionary.parquet:

> Can you update the README to describe how these files were generated.
this file was generated a long while back and I picked it up from an abandoned patch here 

Currently looking for a better way to generate this file using the java parquet client.

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 4
Gerrit-Owner: Bikramjeet Vig <>
Gerrit-Reviewer: Bikramjeet Vig <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Comment-Date: Thu, 19 Oct 2017 20:34:56 +0000
Gerrit-HasComments: Yes

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message