aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Farner" <wfar...@apache.org>
Subject Re: Review Request 30647: Instrument the HealthChecker to export stats.
Date Wed, 18 Feb 2015 03:47:58 GMT


> On Feb. 18, 2015, 1:07 a.m., Bill Farner wrote:
> > src/main/python/apache/aurora/executor/common/health_checker.py, line 151
> > <https://reviews.apache.org/r/30647/diff/7/?file=866744#file866744line151>
> >
> >     What's the intended use of this metric?  Since it's exported as a gauge, it's
lossy depending on the poll frequency.
> >     
> >     Unless there's a concrete use, i suggest killing this.
> 
> Brian Wickman wrote:
>     What would your suggestion be?  Average latency?  There are no implementations of
gauge aggregations or anything like that in the python twitter.common.metrics.  I think it
can still be valuable to see health check latency.  If 50% of your fleet is reporting 200ms
health check intervals it probably indicates a problem?  Or if that number consistently goes
up.

How about something that you can use with a rate ratio?  This would mean a monotonic counter
for number of health checks, and another accumulating latency.


- Bill


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30647/#review72856
-----------------------------------------------------------


On Feb. 18, 2015, 1 a.m., Brian Wickman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30647/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2015, 1 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Bill Farner.
> 
> 
> Bugs: AURORA-1062
>     https://issues.apache.org/jira/browse/AURORA-1062
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Instrument the HealthChecker to export stats.
> 
> HealthChecker plugin now should export three stats:
>   consecutive_failures: number of consecutive failures experienced (resets on success)
>   latency: how long health checks are taking in practice
>   snoozed: whether or not the health checker is snoozed
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/common/health_checker.py 60676ba0fbd8a218fe4309f07de28e2c66d54530

>   src/main/python/apache/aurora/executor/common/status_checker.py 624921d68199df098ea51ee8a10815403bf58984

>   src/test/python/apache/aurora/executor/common/test_health_checker.py a4e215d4422e3ada7b7913eaab105fdf030695c5

>   src/test/python/apache/aurora/executor/test_thermos_executor.py c8fab307d17949a8157659c4b3944ec7520feb9d

> 
> Diff: https://reviews.apache.org/r/30647/diff/
> 
> 
> Testing
> -------
> 
> ./pants test.pytest --no-fast src/test/python/apache/aurora/executor/common::
> 
> 
> Thanks,
> 
> Brian Wickman
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message