impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4817: Populate Parquet Statistics for Strings
Date Mon, 10 Apr 2017 15:33:01 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4817: Populate Parquet Statistics for Strings

Patch Set 2:

File be/src/exec/

Line 541:     const string* thrift_stats = nullptr;
bad variable name: this is a plain-encoded value. thrift_stats sounds like it's a struct (it's
definitely not something that requires a plural).

Line 546:       thrift_stats = ParquetMetadataUtils::GetThriftStats(
why did you break this up instead of having ReadFromThrift do the extra work? do you need
GetThriftStats anywhere else? the old control flow was easier to follow.

Line 556:     if (!thrift_stats) continue;
explicit comparison
File be/src/exec/parquet-column-stats.h:

Line 31: /// This class, together with its derivatives, is used to track column statistics
track is really not a meaningful term here, it generally just means 'follow'. revise description.

Line 36: /// We currently support tracking 'min_value' and 'max_value' values for statistics.
hopefully also min and max, no? tracking means reading/decoding.

Line 46: /// We currently don't write statistics for DECIMAL values and TIMESTAMP values due
why is that still not the case?

Line 66:       const string& thrift_stats, const ColumnType& col_type, void* slot);
you changed the type and meaning of a parameter, but you didn't change the name.

Line 72:   /// Creates a copy of the contents of this object. Some data types (e.g. StringValue)
unclear what this does, because copies are usually returned or passed into something.

Line 146: /// This class contains further type-specific behavior that is common only to a
subset of
why can't this be collapsed into a 2-level hierarchy?

Line 151:  protected:
protected follows public section
File be/src/exec/parquet-column-stats.inline.h:

Line 34: inline int64_t TypedColumnStatsBase<T>::BytesNeeded() const {
hard to compare, please move the functions back to where they were. feel free to reorder at
the end of the review cycle.
File be/src/exec/

Line 243:     int col_idx, const StatsField& stats_field, const ColumnType& col_type)
col_idx is unused.

instead of passing in both the columnchunk and columntype, why not use col_chunk.meta_data.type?
File be/src/exec/parquet-metadata-utils.h:

Line 60:   static bool ReadOldStats(const ColumnType& col_type);
this sounds like it's doing some reading.

also, 'deprecated' instead of old.
File common/thrift/parquet.thrift:

Line 344:   BROTLI = 4;
do we return an error when we see that codec?

Line 567: /** Union containing the order used for min, max, and sorting values in a column
why isn't this implied by the logical type of that column?

if this is simply to mark "legacy" ordered columns, then why not simply have a bool here?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-HasComments: Yes

View raw message