aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zameer Manji <zma...@apache.org>
Subject Re: Review Request 53965: track percentile values for timed sliding stats
Date Tue, 22 Nov 2016 18:25:31 GMT


> On Nov. 22, 2016, 10:07 a.m., David McLaughlin wrote:
> > Thanks for the patch! I think percentiles are crucial for analyzing scheduling performance
and I'm pretty excited about having them for our latencies. However, I'm not sure if it makes
sense to patch our local fork of twitter.commons. It might be better to either:
> > 
> > a) Upstream these changes to twitter.commons (where we might get better feedback
on the approach here) where the tests for the code actually runs, and then copy them over
into our fork 
> > 
> > OR
> > 
> > b) Move off twitter-commons stats and use something like util-stats (https://github.com/twitter/util/tree/develop/util-stats)
instead. 
> > 
> > 
> > The key advantage of (b) is that we'd be using a library that is very lightweight
but has both percentiles and can support multiple granularities (e.g. for per second metrics
for graphview, but minutely metrics for long term storage) and has a migration strategy (I've
done this migration internally at Twitter myself). 
> > 
> > I'm not sure what the implications of using (b) would be in terms of our dependency
graph, though. AFAIK we forked twitter-commons originally because of that.
> 
> Joshua Cohen wrote:
>     I know Zameer had some intentions of modernizing our stats suite and had been doing
some early work that was laying the groundwork for replacing Twitter Commons[1][2]. I'm definitely
in favor of reducing our usage of Twitter Commons, replacing it with something that's actively
maintained wherever possible (e.g. the recent Curator work to replace our ZK singleton service).
So given that, I definitely don't want to contribute this patch back upstream to Twitter Commons;
I think that ship sailed when we forked it. I'm ok with a stop-gap approach where we update
Percentile in our fork with a long term goal of moving to some other stats library, whatever
that may be.
>     
>     (And for what it's worth, the tests for Twitter Commons definitely run in our fork,
I verified this locally by adding a `fail()` to `PercentileTest`, running `gradlew build`
then failed as expected).
>     
>     
>     [1] https://github.com/apache/aurora/commit/059b08621ae4892d98954a6cd0e88f274cdd1bef
>     [2] https://github.com/apache/aurora/commit/9b34a4036ab84f560ca80dfce9bdcd831efdeb30

I was hoping for us to migrate to http://metrics.dropwizard.io/3.1.0/ which is a very widely
used and popular metrics library. On par on how Curator is the ZK library for Java.

I believe it is possible for us to hide metrics behind the `StatsProvider` interface.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53965/#review156595
-----------------------------------------------------------


On Nov. 22, 2016, 8:35 a.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53965/
> -----------------------------------------------------------
> 
> (Updated Nov. 22, 2016, 8:35 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Mehrdad Nurolahzade, and Santhosh
Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> The timing of many methods are sampled with @Timed annotation. Currently SlidingSatas
is being used to track total-msec-in-method-x and times-invoked-of-method-x and then compute
the mean-msec-in-method-x-per-invocation. However mean values are known not to represent distrubtions
with high skewness. 
> 
> This is request for review feedback for changes on computing stats, esp. timing stats
by adding percentile.
> 
> This patch introduces `SlidingTimedStats`, which uses `SlidingStats` to track the mean
mean-msec-in-method-x-per-invocation for events in the last K sampling windows and `Percentiles`
to track the p-values for msec-in-method-x for all invocations in the last K windows.
> 
> __changes overview__:
> - Introduced `SlidingTimedStats` to measure percentiles on all @Timed method calls
> - Comments added for a better way to calculate ratio based on two rates (num and denom)
in `SlidingStats`.
> - Tests added
> 
> _sample new stat:_
> attribute_store_fetch_one_10_0_percentile 0.0
> attribute_store_fetch_one_50_0_percentile 0.0
> attribute_store_fetch_one_90_0_percentile 0.0
> attribute_store_fetch_one_99_0_percentile 0.0
> attribute_store_fetch_one_99_9_percentile 0.0
> attribute_store_fetch_one_99_99_percentile 0.0
> 
> 
> Diffs
> -----
> 
>   commons/src/main/java/org/apache/aurora/common/inject/TimedInterceptor.java a4c4240b67cafb108ca587ad86d9fb9d9bb1553d

>   commons/src/main/java/org/apache/aurora/common/stats/SlidingStats.java f7a5ae41e307627fc55157758e9b7cdd861c3268

>   commons/src/main/java/org/apache/aurora/common/stats/SlidingTimedStats.java PRE-CREATION

>   commons/src/test/java/org/apache/aurora/common/stats/SlidingTimedStatsTest.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/53965/diff/
> 
> 
> Testing
> -------
> 
> SlidingTimedStatsTest
> > passed
> 
> Seems like gradle is not set to run tests on `common`. Tests passed in IDEA.
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message