aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Moses Nakamura" <nny...@gmail.com>
Subject Re: Review Request 31104: task-executor: TASK_RUNNING after first health check
Date Thu, 19 Feb 2015 04:03:20 GMT


> On Feb. 18, 2015, 7 p.m., Stephan Erb wrote:
> > For this change to be useful, we also have to think about the meaning of `initial_interval_secs`.
In its curent form, health checks only start when the initial delay has passed. Commonly this
delay has to be set very high in order to guarantee that a task will come up even in a worst
case scenario (e.g., server where I pull my binary from is slow today). With your change however,
no task would be considered running until this worst case time window has passed.
> > 
> > A potential solution would be to change the meaning of `initial_interval_secs` to
always send health checks but to ignore any errors.

+1, that's a good idea.


- Moses


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31104/#review72980
-----------------------------------------------------------


On Feb. 18, 2015, 4:32 a.m., Moses Nakamura wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31104/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2015, 4:32 a.m.)
> 
> 
> Review request for Aurora and Brian Wickman.
> 
> 
> Bugs: AURORA-894
>     https://issues.apache.org/jira/browse/AURORA-894
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is the first step in changing TASK_RUNNING to mean that the application is alive
and responding to health checks (if the task is configured to support health checks).  This
review is just to get feedback, I can't do this review in parts because the scheduler must
be changed in lockstep with the executor, or everything will break.
> 
> I don't know if this is the right approach, could you give me some high level advice?
 I'm also not sure who to add to this review.
> 
> Here is the high level description that we came up with:
> 
> http://mail-archives.apache.org/mod_mbox/incubator-aurora-dev/201412.mbox/%3CCAOTkfX4KTUpMVcjeFf5%3DvvGXb91to5baNSzvyiwtk-sTddxGXQ%40mail.gmail.com%3E
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 9c0282392dbb9cca308baf47adc1750c1f5cacc6

>   src/main/python/apache/aurora/executor/common/announcer.py dda76f018f472d7d8228459eb89f4c5daf9df26d

>   src/main/python/apache/aurora/executor/common/health_checker.py 60676ba0fbd8a218fe4309f07de28e2c66d54530

>   src/main/python/apache/aurora/executor/common/resource_manager.py 08e02e41b581f275f070228bb23c4cf2a0489f9a

>   src/main/python/apache/aurora/executor/common/status_checker.py 624921d68199df098ea51ee8a10815403bf58984

>   src/test/python/apache/aurora/executor/common/test_announcer.py 6b782778e52394de3744b43003226dac3f65169e

>   src/test/python/apache/aurora/executor/common/test_health_checker.py def249c2509a28f7145380f250f79202b653dc83

>   src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
8f288f6115ab52265dfaffffda3f41d81271c55a 
> 
> Diff: https://reviews.apache.org/r/31104/diff/
> 
> 
> Testing
> -------
> 
> This hangs after I call is_health_checks_enabled, and I don't know why.  My suspicion
is that I'm throwing an exception and cratering the task executor, but I don't know how to
tell.  How do I get it to print?  I'm running it with:
> 
> ./pants test src/test/python/apache/aurora/executor::
> 
> 
> Thanks,
> 
> Moses Nakamura
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message