impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types
Date Mon, 24 Apr 2017 22:32:05 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining
types
......................................................................


Patch Set 4:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/6563/4/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 375:   scoped_ptr<ColumnStats<T>> page_stats_;
why this change?


http://gerrit.cloudera.org:8080/#/c/6563/4/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 34: /// decode parquet::Statistics into slots.
update


Line 60:   /// Enum to select whether to read minimum or maximum statistics. Values to not
"do not"


Line 73:       const ColumnType& col_type, const StatsField& stats_field, void* slot);
const& isn't necessary here


Line 82:   /// provide the functionality.
is this useful for anything other than strings/var-len data? if not, make name more specific.


Line 83:   virtual void MaterializeValuesToInternalBuffer(){};
) {}


Line 105:   static bool ShouldUseDeprecatedStats(const ColumnType& col_type);
CanUse-? or should as in 'would be preferable'/'there's a moral imperative'?


Line 108: /// This class contains the type-specific behavior to track statistics per column.
i had to remind myself again what T was. would be useful to point out in the class comment
that this is for the types of our in-memory format.


Line 127:   ColumnStats(MemPool* mem_pool, int plain_encoded_value_size)
describe params


Line 135:   /// statistics.
you should point out that it may keep a reference to 'v' until Materialize..() got called.


Line 144:   static bool ReadFromThrift(const string& thrift_stats, void* slot);
this doesn't need to be part of the public interface, columstatsbase can be a friend.


Line 152:   void EncodeValueToString(const T& v, std::string* out) const;
static


Line 154:   /// Decodes a statistics values from 'buffer' into 'result'.
you don't mention here that this is plain-encoded.

since they both convert between our format and plain, why don't you name them uniformly and
make clear that that's what happens.

unless that isn't true, but then you should still call it something like '..To/FromThrift'
for the sake of uniformity.


Line 158:   int64_t BytesNeededInternal(const T& v) const;
why -internal?


Line 161:   static void CopyToBufferInternal(StringBuffer* buffer, StringValue* value);
why -internal?


http://gerrit.cloudera.org:8080/#/c/6563/4/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

Line 28: inline bool ColumnStats<T>::ReadFromThrift(const string& thrift_stats,
void* slot) {
again, you changed the meaning of a variable but not the name.

please pay attention to that yourself.


Line 30:   return DecodeValueFromThrift(thrift_stats, out);
this whole function does almost nothing. is it worth having it (extra level of indirection
= more obscurity) instead of just plugging that code into the call sites? it would still be
a single statement.


Line 189: template <typename T>
why do you need T here?


http://gerrit.cloudera.org:8080/#/c/6563/4/testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test:

Line 138: #       widerow table here.
stray line?


http://gerrit.cloudera.org:8080/#/c/6563/4/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 290: select count(*) from functional_parquet.alltypessmall where string_col < "1"
use a predicate that will result in filter? (so we can see it also works with <)


http://gerrit.cloudera.org:8080/#/c/6563/4/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

Line 60:     # 'timetuple', the datetime __eq__ function will forward an unknown equality
check to
'timetuple' will the datetime __eq__ function forward...


-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message