mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MESOS-1474) Provide cluster maintenance primitives for operators.
Date Thu, 12 Jun 2014 18:33:02 GMT
Benjamin Mahler created MESOS-1474:
--------------------------------------

             Summary: Provide cluster maintenance primitives for operators.
                 Key: MESOS-1474
                 URL: https://issues.apache.org/jira/browse/MESOS-1474
             Project: Mesos
          Issue Type: Epic
          Components: framework, master, slave
            Reporter: Benjamin Mahler


Normally cluster upgrades can be done seamlessly using the built-in slave recovery feature.
However, there are situations where operators want to be able to perform destructive maintenance
operations on machines:

* Non-recoverable slave upgrades.
* Machine reboots.
* Kernel upgrades.
* etc.

In these situations, best practice is to perform rolling maintenance in large batches of machines.
This can be problematic for frameworks when many related tasks are located within a batch
of machines going for maintenance.

There are a few primitives of interest here:

* Provide a way for operators to fully shutdown a slave (killing all tasks underneath it).
* Provide a way for operators to mark specific slaves as undergoing maintenance. This means
that no more offers are being sent for these slaves, and no new tasks will launch on them.
* Provide a way for frameworks to be notified when resources are requested to be relinquished.
This gives the framework to proactively move a task before it is forcibly killed. It also
allows the automation of operations like: "please drain and shutdown these slaves within 1
hour."



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message