impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Date Fri, 20 Jan 2017 17:45:36 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer

Patch Set 2:

(1 comment)
File be/src/exec/parquet-column-stats.h:

Line 39: /// TIMESTAMP values are written in the in-memory format used by Impala, relative
to UTC,
> It's not that Hive and parquet-mr do it differently, it's simply that there
I agree there's no logical timestamp type, but the physical type is still an INT96, not a
generic binary type. I see that parquet-mr internally uses a byte array to represent INT96,
but that's an implementation artifact of parquet-mr.

My reasons for thinking this is a bug:
* INT96 should be ordered in the same way as INT64 and INT32
* ordering INT96 by little-endian byte order is minimally useful for min-max pruning. 

It seems like this code that creates a BinaryStatistics object for an INT96 is the culprit:

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Reviewer: Zoltan Ivanfi <>
Gerrit-HasComments: Yes

View raw message