impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Date Thu, 15 Sep 2016 04:38:53 GMT
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
......................................................................


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4371/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS4, Line 196: RuntimeProfile::Counter single_footer_process_timer(TUnit::TIME_NS);
> this is unnecessary, SCOPED_TIMER below just creates a ScopedTimer which cr
Done


http://gerrit.cloudera.org:8080/#/c/4371/4/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

PS4, Line 49: ADD_MIN_MAX_AVG_VALUE_TIMER
> please rename
Sorry, forgot to get this one. Done.


PS4, Line 70: ADD_MIN_MAX_AVG_VALUE_TIMER
> same
Done


PS4, Line 147: .
> can you add:
When you say 'child counters', do you mean it in the terms of the child counters of counters
in the RuntimeProfile? If so, I think it wouldn't be best to assume how it would be used (even
though that's the only use for it now).

Or do you mean every counter that the AveragedCounter keeps a track of is what we are calling
a 'child counter'?


PS4, Line 204: /// Contrary to the AveragedCounter, this only keeps a track of values and
not 'Counter'
             : /// objects themselves.
> this is a bit misleading because it doesn't really keep values, it keeps th
Good call. I rephrased it, but I left out the bit about 'child counters'. I'll add it if we
reach consensus on the above comment.


PS4, Line 249:  /// Current sums of values seen so far and min/max values seen so far.
             :   int64_t current_min_;
             :   int64_t current_max_;
             :   int64_t current_sum_;
> current is unnecessary
Done


http://gerrit.cloudera.org:8080/#/c/4371/4/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS4, Line 301:       const TSummaryStatsCounter& c = node.summary_stats_counters[i];
             :       SummaryStatsCounterMap::iterator it = summary_stats_map_.find(c.name);
             :       if (it == summary_stats_map_.end()) {
             :         summary_stats_map_[c.name] =
             :             pool_->Add(new SummaryStatsCounter(
             :                 c.unit, c.total_num_values, c.min_value, c.max_value, c.sum));
             :       } else {
             :         it->second->SetStats(c);
             :       }
> it's hard to know if this gets exercised in your test case.
Done


PS4, Line 667:  
> super nit: extra space
Done


Line 668:     for (const SummaryStatsCounterMap::value_type& v: summary_stats_map_) {
> I think we need a separate case to handle when total_num_values is 0 becaus
Done


PS4, Line 669: (
> this brace isn't closed at the end
Done


PS4, Line 676: v.second->unit()
> this isn't in the same Counter's units (which is time for the use case you 
Oops, yes. Done.


PS4, Line 1017:  else if
> this isn't an else if, you need to execute both branches
Done


http://gerrit.cloudera.org:8080/#/c/4371/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS4, Line 336: MinMaxAvgValueCounter
> update
Done


PS4, Line 337: averages the time taken to
             :     # scan and process the Parquet footer across scan ranges and also records
the min and
             :     # the max time.
> keeps statistics for the time taken to scan and Process the Parquet footer 
Done


PS4, Line 340:     if ranges_per_node > 1:
             :       num_equal = 0
             :       for min_max_time in footer_processing_time_list:
             :         num_equal += (min_max_time[0] == min_max_time[1])
             :       # The chance that the times taken by different scan ranges to process
the footer,
             :       # are equal, is very unlikely. However, it could happen which could make
the min and
             :       # max time to process a footer equal. Assert that at a time, no more
than 1 node has
             :       # equal min and max time.
             :       assert num_equal < 2
> following up with my previous comment, I still think this is weird and was 
I've removed the multi scan range test and just added a simple verify assert to see if all
the values are the same if we have only one range per node.

I've added BE unit tests to verify that the counter works properly, so we don't need to do
that again here.


-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message