mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Mahler" <benjamin.mah...@gmail.com>
Subject Re: Review Request 20097: Added a configurable limit on the percentage of slaves that can be removed after the re-registration timeout.
Date Tue, 15 Apr 2014 01:28:07 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20097/
-----------------------------------------------------------

(Updated April 15, 2014, 1:28 a.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Cleaned up comments and the flag help string, per Vinod's review.


Bugs: MESOS-764
    https://issues.apache.org/jira/browse/MESOS-764


Repository: mesos-git


Description
-------

After we recover, we remove any slaves that do not re-register within a timeout. As a safety
measure, this patch adds a configurable limit on the percentage of slaves that can be removed
in this manner.

This provides safety guarantees to production operators to ensure that if there are any unforeseen
widespread failures, then the Master will continue to failover as opposed to proceeding to
remove a large percentage of slaves. Operators can tune this for their environment.

The current default is catered towards non-production environments, but I've added a TODO
to explore adding a '--production' flag that will allow us to use different defaults (<
100% removal limit, no auto-initialization, etc).

See the flag description for more details.

We should add a 'Percentage' abstraction in the future per MESOS-1162.


Diffs (updated)
-----

  src/master/constants.hpp 52d8d779e92f3be2b84d9237a9abbd2f580c0906 
  src/master/constants.cpp 1cb8f22558b5bd3d90b24fcc6bf70cbe615a335c 
  src/master/flags.hpp 024f86d93824a20ce42c28b8264576f1cb715d0e 
  src/master/main.cpp f12f20a1eabd163c4f35056bf01f28f3edd408a9 
  src/master/master.cpp 3c3c989543167afb7d368a19a16457ed00e6be0c 

Diff: https://reviews.apache.org/r/20097/diff/


Testing
-------

make check

It is currently not possible to exercise this case, given the use of EXIT.


Thanks,

Ben Mahler


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message