mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From haosdent huang <haosd...@gmail.com>
Subject Re: Review Request 52865: Refactored HealthChecker to never stop health checking.
Date Fri, 11 Nov 2016 17:13:37 GMT


> On Oct. 17, 2016, 6:53 a.m., haosdent huang wrote:
> > src/health-check/health_checker.cpp, lines 206-217
> > <https://reviews.apache.org/r/52865/diff/1/?file=1537866#file1537866line206>
> >
> >     After we never stop health check, `consecutiveFailures` may become to 0 after
success again. Then `killTask` would transform from `true` to `false` here. Is it a expected
bahaviour?
> 
> Alexander Rukletsov wrote:
>     Very good point, Haosdent.
>     
>     The problem here is that **one entity decides** when a task should be killed, but
**another entity enforce** this. The first one cannot really enforce what the second does.
What is the least surpising behaviour is that unfortunate architecture? My opinion is to reset
if the second entity, i.e. executor, does not comply.
>     
>     A better architecture would be to separate "health checker" from "unhealthy policy
enforcer". As we've already agreed, we need a "global" health check policy, see [MESOS-6171](https://issues.apache.org/jira/browse/MESOS-6171).
With two "unhealthy policies", local and global, the health checker library should simply
report the health status, while the executor will apply one of the policies (that may still
be implemented in a health checker library for code reuse). If you think this makes sense,
do you mind filing a ticket about this?

Got it, create at https://issues.apache.org/jira/browse/MESOS-6578


- haosdent


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52865/#review152827
-----------------------------------------------------------


On Oct. 14, 2016, 12:37 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52865/
> -----------------------------------------------------------
> 
> (Updated Oct. 14, 2016, 12:37 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Benjamin Mahler, Gastón Kleiman, and haosdent
huang.
> 
> 
> Bugs: MESOS-5963
>     https://issues.apache.org/jira/browse/MESOS-5963
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Prior to this patch, HealthChecker would stop performing health
> checks after it marks the task for kill. Since tasks' lifecycle
> is managed by scheduler-executor, HealthChecker should never stop
> health checking on its own.
> 
> 
> Diffs
> -----
> 
>   src/docker/executor.cpp ab3f0473fdc9105d1c425f0dbe7b81c566d541e8 
>   src/health-check/health_checker.hpp 392b4d5bd1e5831994b9366c1eb5a2911e19860f 
>   src/health-check/health_checker.cpp 96ae1a733ff3d211b84d0893b4603873af1c89f0 
>   src/launcher/default_executor.cpp af4a97f7de5f2157aa65fdab742455b0683c40a4 
>   src/launcher/executor.cpp 3e95d6029bea0ce6e0dfb39c24b795fe98d90d13 
>   src/tests/health_check_tests.cpp 1d1676d7259bf52cfb1e499954fa815fe7e37522 
> 
> Diff: https://reviews.apache.org/r/52865/diff/
> 
> 
> Testing
> -------
> 
> See https://reviews.apache.org/r/52873/.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message