aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Cohen <jco...@apache.org>
Subject Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.
Date Fri, 02 Dec 2016 21:44:56 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/#review157764
-----------------------------------------------------------




src/main/python/apache/aurora/executor/common/health_checker.py (line 113)
<https://reviews.apache.org/r/54299/#comment228376>

    s/suppose/supposed



src/main/python/apache/aurora/executor/common/health_checker.py (lines 115 - 117)
<https://reviews.apache.org/r/54299/#comment228434>

    There still exists the chance for a backwards incompatibility here. Under the previous
watch-driven updates, a task could flip between failing and successful health checks, and
as long as it's still running at the end of `watch_secs` the updater would consider it healthy
and move on. With this new behavior, someone could configure a task in such a way that the
max attempts are consumed without reaching `max_consecutive_failures` or `min_consecutive_successes`
before `watch_secs` is elapsed, meaning that the task would fail.
    
    As we discussed earlier, if we make `watch_secs` and `min_consecutive_successes` mutually
exclusive in the client, then the executor could only trigger the new behavior if the user
opted in by setting `watch_secs` to 0 and `min_consecutive_successes` to non-zero.


- Joshua Cohen


On Dec. 2, 2016, 8:43 a.m., Santhosh Kumar Shanmugham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54299/
> -----------------------------------------------------------
> 
> (Updated Dec. 2, 2016, 8:43 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1841
>     https://issues.apache.org/jira/browse/AURORA-1841
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> It is possible to set the health checks such that a task can
> continually fail health checks with intermittent successes and still
> succeed an update. Essentially a task fails health checks during the
> `initial_interval_secs` and an additional `max_consecutive_failures`,
> and then perform a successful health check to become healthy.
> 
> To be backward compatible to the above configuration, include the
> `max_consecutive_failures` when computing `max_attempts_to_running`.
> 
> 
> Diffs
> -----
> 
>   docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
>   src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49

>   src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2

> 
> Diff: https://reviews.apache.org/r/54299/diff/
> 
> 
> Testing
> -------
> 
> build-support/jenkins/build.sh
> src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message