impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Date Fri, 20 Oct 2017 00:36:39 GMT
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 )

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
......................................................................


Patch Set 5:

(6 comments)

Mostly just cleanup and simplification comments now.

http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/hdfs-parquet-scanner.h@45
PS5, Line 45: INTERNAL_TYPE
I should have mentioned this earlier, but I think we mostly use CamelCase for type names inside
templates. We're not totally consistent throughout the codebase but that's the general pattern.


http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h@328
PS5, Line 328: inline int ParquetPlainEncoder::Encode(
It looks like we could also deduplicate the Encode/Decode methods for FIXED_LEN_BYTE_ARRAY
with the same technique.


http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h@350
PS5, Line 350: Decode<Decimal4Value,parquet::Type::FIXED_LEN_BYTE_ARRAY>(const uint8_t*
buffer,
nit: missing space


http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc
File be/src/exec/parquet-plain-test.cc:

http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc@193
PS4, Line 193: izeof(D
> for var len decimals, its encoded size is bytes required to store size + le
Thanks for the explanation.


http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-plain-test.cc
File be/src/exec/parquet-plain-test.cc:

http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-plain-test.cc@112
PS5, Line 112: //    Decimal4Value: General test case,
I find this a bit confusing still because the values being encoded aren't in the table. Could
we take it even further and have something like:

  struct DecimalTestCase {
    int decimal_byte_size;
    int128_t decimal_val;
    int expected_size;
  }

That way all of the inputs to each test would be in the same place.


http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-plain-test.cc@200
PS5, Line 200: sizeof(int32_t)
Is there any reason to keep the encoding overhead separate from the encoded size? I would
have thought we would just check the total size.



-- 
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 5
Gerrit-Owner: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mjacobs@apache.org>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Oct 2017 00:36:39 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message