impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Date Fri, 20 Jan 2017 15:20:36 GMT
Lars Volker has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
......................................................................


Patch Set 2:

> (1 comment)

Apologies for the delayed reply. Hive writes timestamps using 12 bytes using little endian.
Then it passes them to parquet-mr as a BINARY string, which means it is hitting PARQUET-251.
This explains why I saw the odd values for min/max in my tests.

Internally parquet-mr orders BINARY values using byte comparison, potentially leading to a
min/max value not being the semantically smallest/largest value of a set of values. I am inclined
to call this a bug in hive, but I'm curious to hear what you think about this.

-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+gerrit@cloudera.com>
Gerrit-HasComments: No

Mime
View raw message