impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Impala Public Jenkins (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Date Tue, 07 Nov 2017 04:34:27 GMT
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7822
)

Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
......................................................................

IMPALA-2494: Support for byte array encoded decimals in Parquet scanner

Extendes parquet column reader and associated classes to allow for more
than one possible physical type for a given logical type. This patch
only adds support for variable sized byte array encoded decimals and
more will be added in upcoming commits.
Also, column level metadata verification which was currently being
done per row group will now only be done once per column per file.

Testing:
Added backend test for verifying newly added decimal types are decoded
correctly.
Added Query test that decodes both plain and dictionary-encoded
decimals using binary encoding.

Performance:
Initial perf testing using tpcds_1000 shows no regression.

Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Reviewed-on: http://gerrit.cloudera.org:8080/7822
Reviewed-by: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M be/src/exec/parquet-plain-test.cc
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
M testdata/data/README
A testdata/data/binary_decimal_dictionary.parquet
A testdata/data/binary_decimal_no_dictionary.parquet
A testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test
M tests/query_test/test_scanners.py
19 files changed, 634 insertions(+), 283 deletions(-)

Approvals:
  Bikramjeet Vig: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/7822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e
Gerrit-Change-Number: 7822
Gerrit-PatchSet: 12
Gerrit-Owner: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet.vig@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mjacobs@apache.org>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message