impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Date Tue, 13 Sep 2016 00:00:49 GMT
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
......................................................................


Patch Set 3:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/4371/2//COMMIT_MSG
Commit Message:

PS2, Line 15: tr
> remove a
Done


PS2, Line 18: 
> remove, this is implied if it is a runtime profile counter
Done


PS2, Line 21: 
> serialize ... to thrift
Done


http://gerrit.cloudera.org:8080/#/c/4371/2/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

PS2, Line 214:  {
> I'd be nice to have a simpler name rather than enumerate the stats this kee
Done


PS2, Line 223:    current_sum_(0) {
             :   }
> space between these
Done


PS2, Line 226: 64_t min_value() const { return current_min_; }
             :   int64_t max_value(
> initialize min and max as well
Done


PS2, Line 230:   /// seen so far.
             :   void UpdateCounter(int64_t new_value);
             : 
> 1 line
Done


PS2, Line 234:   /// Set() and Add() should not be used.
             :   virtual void Set(double value) { DCHECK(false); }
             :   v
> 1 line
Done


PS2, Line 238: 
             :   /// This functio
> this can be one sentence
Done


PS2, Line 240: t TSummaryStatsCount
> This is a bit confusing since it starts to look like the AveragedCounter wh
Yes that's right. The only reason I kept it as a Counter instead of a value was to compare
if the units of both the values were equal.

But if you foresee it being used even without a counter as a source, it makes sense to just
take a value here.

Changed it.


PS2, Line 244:  private:
             :   /// This keeps track of the total number of values seen so far.
             :   int32_t total_num_values_;
             : 
             :   /// Current sums of values seen so far and min/max values seen so far.
             :   int64_t current_min_;
             :   int64_t current_max_;
             :   int64_t current_sum_;
             : 
             :   SpinLock counter_lock_;
             : };
> remove extra space, each fn can be 1 line. I know this is what happens abov
Done


http://gerrit.cloudera.org:8080/#/c/4371/2/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS2, Line 303: if (it == summary_stats_map_.end()) {
> if this counter already existed, shouldn't the values be updated?
Yes, I initially thought that counters will always have the same value when they arrive through
reports, however, that needn't be true for long running queries.

I've added a method Overwrite() to take care of this.


Line 661:   }
> can you add a comment with how this looks printed out?
Done


PS2, Line 794: 
             :   {
> vector can be constructed with a size parameter to do this in 1 step.
Done


Line 1008:   ++total_num_values_;
> why is this safe? couldn't current_sum_ become 0 at an arbtrary time (if a 
Done


http://gerrit.cloudera.org:8080/#/c/4371/2/common/thrift/RuntimeProfile.thrift
File common/thrift/RuntimeProfile.thrift:

PS2, Line 56: TSummaryStatsCounter {
> TSummaryStatsCounter
Done


PS2, Line 59: sum
> sum?
Done


http://gerrit.cloudera.org:8080/#/c/4371/2/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS2, Line 343:       # The chance that the times taken by different scan ranges to process
the footer,
             :       # are equal, is very unlikely. However, it could happen which could make
the min and
             :       # max time to process a footer equal. Assert that at a time, no more
than 1 node has
             :       # equal min and max time.
             :       assert num_equal < 2
> this seems brittle...
The reason I didn't do that is because the times from the runtime profile are listed as such:
"330.23ms, 1s23ms, 22ns", etc.

There's no good way to compare them without passing them through a helper function to standardize
their values before comparison. However, I thought that adding that helper function might
clutter this file.

If you still think it's worth it, I can go ahead and do it.


-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message