aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Khutornenko (JIRA)" <>
Subject [jira] [Assigned] (AURORA-651) perform_maintenance_hosts should not temporarily remove machines
Date Tue, 19 Aug 2014 17:31:19 GMT


Maxim Khutornenko reassigned AURORA-651:

    Assignee: Maxim Khutornenko

> perform_maintenance_hosts should not temporarily remove machines
> ----------------------------------------------------------------
>                 Key: AURORA-651
>                 URL:
>             Project: Aurora
>          Issue Type: Task
>          Components: Client
>            Reporter: David Robinson
>            Assignee: Maxim Khutornenko
> The aurora_admin tool provides the following drain/maintenance commands:
> - start_maintenance_hosts
>     The list of hosts is marked for maintenance, and will be de-prioritized
>     from consideration for scheduling.  Note, they are not removed from
>     consideration, and may still schedule tasks if resources are very scarce.
>     Usually you would mark a larger set of machines for drain, and then do
>     them in batches within the larger set, to help drained tasks not land on
>     future hosts that will be drained shortly in subsequent batches.
> - host_maintenance_status
>     Print the drain status of each supplied host.
> - perform_maintenance_hosts
>     Asks the scheduler to remove any running tasks from the machine and remove it
>     from service temporarily, perform some action on them, then return the machines
>     to service.
> - end_maintenance_hosts
>     The list of hosts is marked as not in a drained state anymore.  This will
>     allow normal scheduling to resume on the given list of hosts.
> The command that actually drains a machine is the perform_maintenance_hosts command,
however it only drains a machine *temporarily*. As soon as the machine is drained it is placed
back into service, thereby allowing tasks to be scheduler on it. This default behavior is
wrong. The expected workflow is that the --post_drain_script option is used and the script
is expected to shutdown the slave, typically by SSHing in and stopping the mesos process.
It's not obvious that perform_maintenance_hosts's --post_drain_script must be used along with
a script to properly drain a machine, and the admin tool does not provide any other commands
that could be used to drain a machine *and leave it drained*.
> The ideal solution is described in AURORA-43.

This message was sent by Atlassian JIRA

View raw message