impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Date Tue, 31 Jan 2017 16:17:04 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer

Patch Set 10:

File be/src/exec/

Line 136:     if (row_group_stats_base_->has_values()
oops, i missed that bug

Line 339: 
> Yes, the dictionary encoders use it as the size of the encoded values. Shou
sounds good
File be/src/exec/

Line 139:       row_group_stats_base_->EncodeToThrift(&encoded);
why not just pass in metadata->statistics and then set the __isset flag by hand?

Line 652:     page_stats_base_->EncodeToThrift(&encoded);
avoid copy by passing in header.data_page_header.statistics
File be/src/exec/hdfs-parquet-table-writer.h:

Line 103:   /// Maximum statistics size. If the size of a single thrift Statistics struct
for a page
qualify as 'parquet.Statistics' so it's clearer
File be/src/exec/parquet-column-stats.h:

Line 127:     }
> The stats should follow the ordering semantics defined in parquet-format. T
i understand how that may not be the case today, but in order for them to be useful, they
need to mirror the ordering semantics of the underlying type. meaning i still don't see why
there should be stats-specific comparisons and why we want this todo.
File be/src/exec/parquet-column-stats.h:

Line 84:   // clang-format off

Line 87:       || std::is_same<bool, T>::value
indent the subsequent lines belonging to the logical expr two more spaces (style guide)

Line 157:     // clang-format off
this is very verbose. why needed?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Michael Brown <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Reviewer: Zoltan Ivanfi <>
Gerrit-HasComments: Yes

View raw message