impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4817: Populate Parquet Statistics for Strings
Date Mon, 10 Apr 2017 15:33:01 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4817: Populate Parquet Statistics for Strings
......................................................................


Patch Set 2:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 541:     const string* thrift_stats = nullptr;
bad variable name: this is a plain-encoded value. thrift_stats sounds like it's a struct (it's
definitely not something that requires a plural).


Line 546:       thrift_stats = ParquetMetadataUtils::GetThriftStats(
why did you break this up instead of having ReadFromThrift do the extra work? do you need
GetThriftStats anywhere else? the old control flow was easier to follow.


Line 556:     if (!thrift_stats) continue;
explicit comparison


http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 31: /// This class, together with its derivatives, is used to track column statistics
when
track is really not a meaningful term here, it generally just means 'follow'. revise description.


Line 36: /// We currently support tracking 'min_value' and 'max_value' values for statistics.
The
hopefully also min and max, no? tracking means reading/decoding.


Line 46: /// We currently don't write statistics for DECIMAL values and TIMESTAMP values due
to
why is that still not the case?


Line 66:       const string& thrift_stats, const ColumnType& col_type, void* slot);
you changed the type and meaning of a parameter, but you didn't change the name.


Line 72:   /// Creates a copy of the contents of this object. Some data types (e.g. StringValue)
unclear what this does, because copies are usually returned or passed into something.


Line 146: /// This class contains further type-specific behavior that is common only to a
subset of
why can't this be collapsed into a 2-level hierarchy?


Line 151:  protected:
protected follows public section


http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

Line 34: inline int64_t TypedColumnStatsBase<T>::BytesNeeded() const {
hard to compare, please move the functions back to where they were. feel free to reorder at
the end of the review cycle.


http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/parquet-metadata-utils.cc
File be/src/exec/parquet-metadata-utils.cc:

Line 243:     int col_idx, const StatsField& stats_field, const ColumnType& col_type)
{
col_idx is unused.

instead of passing in both the columnchunk and columntype, why not use col_chunk.meta_data.type?


http://gerrit.cloudera.org:8080/#/c/6563/2/be/src/exec/parquet-metadata-utils.h
File be/src/exec/parquet-metadata-utils.h:

Line 60:   static bool ReadOldStats(const ColumnType& col_type);
this sounds like it's doing some reading.

also, 'deprecated' instead of old.


http://gerrit.cloudera.org:8080/#/c/6563/2/common/thrift/parquet.thrift
File common/thrift/parquet.thrift:

Line 344:   BROTLI = 4;
do we return an error when we see that codec?


Line 567: /** Union containing the order used for min, max, and sorting values in a column
why isn't this implied by the logical type of that column?

if this is simply to mark "legacy" ordered columns, then why not simply have a bool here?


-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message