impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types
Date Fri, 05 May 2017 14:32:24 GMT
Lars Volker has uploaded a new patch set (#5).

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining

IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types

This change adds functionality to write parquet::Statistics for Decimal,
String, and Timestamp values.

It also switches from using the deprecated fields 'min' and 'max' to
populate the new fields 'min_value' and 'max_value' in
parquet::Statistics, that were added in parquet-format pull request #46.

The HdfsParquetScanner will preferably read the new fields if they are
populated and if the column order 'TypeDefinedOrder' has been used to
compute the statistics. For columns without a column order set or with
only the deprecated fields populated, the scanner will read them only if
they are of simple numeric type, i.e. boolean, integer, or floating

This change removes the comparison of the Parquet Statistics we write to
Hive from the tests, since Hive does not write the new fields. Instead
it adds a parquet file written by Hive that uses the deprecated fields
for its statistics. It exercises the fallback logic for supported types
in a test using that file.

Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
M be/src/exec/
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/
M be/src/exec/
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/
M be/src/exec/parquet-metadata-utils.h
M common/thrift/parquet.thrift
M testdata/data/README
A testdata/data/deprecated_statistics.parquet
A testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test
M testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
M tests/query_test/
M tests/query_test/
15 files changed, 874 insertions(+), 229 deletions(-)

  git pull ssh:// refs/changes/63/6563/5
To view, visit
To unsubscribe, visit

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <>
Gerrit-Reviewer: Attila Jeges <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Tim Armstrong <>

View raw message