aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Wickman" <wick...@apache.org>
Subject Re: Review Request 30647: Instrument the HealthChecker to export stats.
Date Wed, 18 Feb 2015 01:12:22 GMT


> On Feb. 18, 2015, 1:07 a.m., Bill Farner wrote:
> > src/main/python/apache/aurora/executor/common/health_checker.py, line 151
> > <https://reviews.apache.org/r/30647/diff/7/?file=866744#file866744line151>
> >
> >     What's the intended use of this metric?  Since it's exported as a gauge, it's
lossy depending on the poll frequency.
> >     
> >     Unless there's a concrete use, i suggest killing this.

What would your suggestion be?  Average latency?  There are no implementations of gauge aggregations
or anything like that in the python twitter.common.metrics.  I think it can still be valuable
to see health check latency.  If 50% of your fleet is reporting 200ms health check intervals
it probably indicates a problem?  Or if that number consistently goes up.


- Brian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30647/#review72856
-----------------------------------------------------------


On Feb. 18, 2015, 1 a.m., Brian Wickman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30647/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2015, 1 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Bill Farner.
> 
> 
> Bugs: AURORA-1062
>     https://issues.apache.org/jira/browse/AURORA-1062
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Instrument the HealthChecker to export stats.
> 
> HealthChecker plugin now should export three stats:
>   consecutive_failures: number of consecutive failures experienced (resets on success)
>   latency: how long health checks are taking in practice
>   snoozed: whether or not the health checker is snoozed
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/common/health_checker.py 60676ba0fbd8a218fe4309f07de28e2c66d54530

>   src/main/python/apache/aurora/executor/common/status_checker.py 624921d68199df098ea51ee8a10815403bf58984

>   src/test/python/apache/aurora/executor/common/test_health_checker.py a4e215d4422e3ada7b7913eaab105fdf030695c5

>   src/test/python/apache/aurora/executor/test_thermos_executor.py c8fab307d17949a8157659c4b3944ec7520feb9d

> 
> Diff: https://reviews.apache.org/r/30647/diff/
> 
> 
> Testing
> -------
> 
> ./pants test.pytest --no-fast src/test/python/apache/aurora/executor/common::
> 
> 
> Thanks,
> 
> Brian Wickman
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message