impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types
Date Fri, 05 May 2017 14:58:54 GMT
Lars Volker has posted comments on this change.

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining

Patch Set 6:


Thank you for your review. Please see PS6 and my inline comments.
File be/src/exec/

Line 375:   scoped_ptr<ColumnStats<T>> page_stats_;
> why this change?
The stats now need to have their memory pool reset in Reset() (L297), so I followed the surrounding
code (DictEncoder). Should we instead keep the objects and pass the memory pool to page_stats_.Reset()?
File be/src/exec/parquet-column-stats.h:

Line 34: /// decode parquet::Statistics into slots.
> update

Line 60:   /// Enum to select whether to read minimum or maximum statistics. Values do not
> "do not"

Line 73:       const ColumnType& col_type, const parquet::ColumnOrder* col_order,
> const& isn't necessary here

Line 82:   /// batch, since the batch memory will be released. Overwrite this method in derived
> is this useful for anything other than strings/var-len data? if not, make n

Line 83:   /// classes to provide the functionality.
> ) {}
git-clang-format does this. If you feel strongly about it, I'll add a TODO for myself to add
the space before submitting the change.

Line 105:   /// 'col_order'. Otherwise, returns false.
> CanUse-? or should as in 'would be preferable'/'there's a moral imperative'

Line 108: 
> i had to remind myself again what T was. would be useful to point out in th

Line 127:         || std::is_same<bool, T>::value
> describe params

Line 135:  public:
> you should point out that it may keep a reference to 'v' until Materialize.

Line 144:       min_buffer_(mem_pool),
> this doesn't need to be part of the public interface, columstatsbase can be

Line 152:   virtual void Merge(const ColumnStatsBase& other) override;
> static
Done. Making it static required passing an additional parameter.

Line 154: 
> you don't mention here that this is plain-encoded.

Line 158:  protected:
> why -internal?

Line 161:   static void EncodePlainValue(const T& v, int64_t bytes_needed, std::string*
> why -internal?
File be/src/exec/parquet-column-stats.inline.h:

Line 28: inline void ColumnStats<T>::Update(const T& v) {
> again, you changed the meaning of a variable but not the name.

Line 30:     has_values_ = true;
> this whole function does almost nothing. is it worth having it (extra level

Line 189:     }
> why do you need T here?
It seemed cleaner to me to define it for all types. Removing T required moving the declaration
before its first usage.
File common/thrift/parquet.thrift:

Line 563:   2: required i64 total_byte_size
> this has now changed, please take a look at
File testdata/workloads/functional-query/queries/QueryTest/parquet-deprecated-stats.test:

Line 138: select count(*) from deprecated_stats where int_col < 0 and timestamp_col !=
"2009-02-01 00:00:00"
> stray line?
File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test:

Line 290: select count(*) from functional_parquet.alltypessmall where string_col < "1"
> use a predicate that will result in filter? (so we can see it also works wi
I'm not sure I understand your comment. The test in L274 filters all rows using "<". Can
you post the query you have in mind?
File tests/query_test/

Line 60:     # 'timetuple' will the datetime __eq__ function forward an unknown equality check
> 'timetuple' will the datetime __eq__ function forward...

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <>
Gerrit-Reviewer: Attila Jeges <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message