mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe Smith (JIRA)" <>
Subject [jira] [Commented] (MESOS-1474) Provide cluster maintenance primitives for operators.
Date Tue, 09 Sep 2014 22:42:28 GMT


Joe Smith commented on MESOS-1474:

The doc looks great! Thanks for all the hard work and rigor!

> Provide cluster maintenance primitives for operators.
> -----------------------------------------------------
>                 Key: MESOS-1474
>                 URL:
>             Project: Mesos
>          Issue Type: Epic
>          Components: framework, master, slave
>            Reporter: Benjamin Mahler
> Normally cluster upgrades can be done seamlessly using the built-in slave recovery feature.
However, there are situations where operators want to be able to perform destructive maintenance
operations on machines:
> * Non-recoverable slave upgrades.
> * Machine reboots.
> * Kernel upgrades.
> * Machine decommissioning.
> * etc.
> In these situations, best practice is to perform rolling maintenance in large batches
of machines. This can be problematic for frameworks when many related tasks are located within
a batch of machines going for maintenance.
> There are a few primitives of interest here:
> * Provide a way for operators to fully shutdown a slave (killing all tasks underneath
> * Provide a way for operators to mark specific slaves as undergoing maintenance. This
means that no more offers are being sent for these slaves, and no new tasks will launch on
> * Provide a way for frameworks to be notified when resources are requested to be relinquished.
This gives the framework to proactively move a task before it is forcibly killed. It also
allows the automation of operations like: "please drain these slaves within 1 hour."

This message was sent by Atlassian JIRA

View raw message