impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoltan Ivanfi (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Date Thu, 19 Jan 2017 18:06:06 GMT
Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5611/2/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 39: /// TIMESTAMP values are written in the in-memory format used by Impala, relative
to UTC,
> I think we convert them to INT96 for parquet. I think the INT96 order is eq
Lars, if I remember correctly you found that Hive does not write statistics according to these
rules. Did you find out what logic they follow? Is it different than this implementation?
If so, will the parsing logic contain a condition that checks whether the stats were written
by Hive?


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+gerrit@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message