impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Date Tue, 31 Jan 2017 18:38:35 GMT
Lars Volker has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
......................................................................


Patch Set 9:

(8 comments)

Thanks for the review. Please see my inline comments and PS11.

http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 339:   int64_t encoded_value_size_;
> sounds good
Done


http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 139:   }
> why not just pass in metadata->statistics and then set the __isset flag by 
Done


Line 652:   // Add the size of the data page header
> avoid copy by passing in header.data_page_header.statistics
Done


http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.h
File be/src/exec/hdfs-parquet-table-writer.h:

Line 103:   /// Maximum statistics size. If the combined size of the min and max values of
> qualify as 'parquet.Statistics' so it's clearer
Done. I used :: since that is the class name in its namespace. Do you prefer "."?


http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 127:       // statistics behavior from any implicit behavior of the types?
> i understand how that may not be the case today, but in order for them to b
If Parquet's and Impala's ordering were "roughly the same", then we would need some translation
between our min values and the ones in Parquet. For our current types, I don't see that as
a problem either, but I think Tim was concerned about adding types in the future and preventing
potential bugs.

I'll let Tim add his thoughts to the discussion, personally I'm good with using min/max for
now. The comment was there to facilitate this discussion, since it came up in reviews of previous
patch sets. I will remove it.


http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 84: 
> remove
Without these, clang-format will undo all manual changes to the style on lines modified by
this change. I added it as a TODO to the commit message to remove those once the change has
a +2, when I will have to rebase it anyways.


Line 87: class ColumnStats : public ColumnStatsBase {
> indent the subsequent lines belonging to the logical expr two more spaces (
Done


Line 157:   /// Returns the number of bytes needed to encode value 'v'.
> this is very verbose. why needed?
See my previous comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+gerrit@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message