mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mesos ReviewBot" <...@mesos.apache.org>
Subject Re: Review Request 20097: Added a configurable limit on the percentage of slaves that can be removed after the re-registration timeout.
Date Wed, 16 Apr 2014 00:43:30 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20097/#review40486
-----------------------------------------------------------


Patch looks great!

Reviews applied: [19857, 20097]

All tests passed.

- Mesos ReviewBot


On April 16, 2014, 12:27 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20097/
> -----------------------------------------------------------
> 
> (Updated April 16, 2014, 12:27 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Bugs: MESOS-764
>     https://issues.apache.org/jira/browse/MESOS-764
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> After we recover, we remove any slaves that do not re-register within a timeout. As a
safety measure, this patch adds a configurable limit on the percentage of slaves that can
be removed in this manner.
> 
> This provides safety guarantees to production operators to ensure that if there are any
unforeseen widespread failures, then the Master will continue to failover as opposed to proceeding
to remove a large percentage of slaves. Operators can tune this for their environment.
> 
> The current default is catered towards non-production environments, but I've added a
TODO to explore adding a '--production' flag that will allow us to use different defaults
(< 100% removal limit, no auto-initialization, etc).
> 
> See the flag description for more details.
> 
> We should add a 'Percentage' abstraction in the future per MESOS-1162.
> 
> 
> Diffs
> -----
> 
>   src/local/local.cpp 7daa5ecbfd9b3eeff548c09f32d5a380444204f7 
>   src/master/constants.hpp 52d8d779e92f3be2b84d9237a9abbd2f580c0906 
>   src/master/constants.cpp 1cb8f22558b5bd3d90b24fcc6bf70cbe615a335c 
>   src/master/flags.hpp 024f86d93824a20ce42c28b8264576f1cb715d0e 
>   src/master/main.cpp f12f20a1eabd163c4f35056bf01f28f3edd408a9 
>   src/master/master.cpp 3c3c989543167afb7d368a19a16457ed00e6be0c 
>   src/tests/cluster.hpp 8479fe370d5a64cb2827b69b6f0c626b9d156d66 
> 
> Diff: https://reviews.apache.org/r/20097/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> It is currently not possible to exercise this case, given the use of EXIT.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message