mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From X Brick <ngdoc...@gmail.com>
Subject Re: MESOS-6233 Allow agents to re-register post a host reboot
Date Tue, 22 Nov 2016 04:14:45 GMT
here is a hacking way to fix it in the current version. backup the
boot_id(it should exist in your $work_dir/meta/boot_id) file when mesos
agent(or slave) start, and restore it with the backup file when agent/slave
restart, slave id will not change. it works fine for ours cluster.

i hope it could help you.

2016-11-15 23:37 GMT+08:00 Megha Sharma <msharma3@apple.com>:

> Hi All,
>
> We have been working on the design for Restartable tasks (
> MESOS-3545) and allowing agents to recover and re-register post reboot is a
> pre-requisite for that.
> Agent today doesn’t recover its state that includes its SlaveID post a
> host reboot, it short-circuits the recovery upon discovering the reboot and
> registers with the master as a new agent. With Partition Awareness, the
> mesos master even allows agents which have failed master’s health check
> pings (unreachable agents) to re-register with it and reconcile the
> tasks/executors. The executors on a rebooted host are anyway terminated so
> there is no harm in letting such an agent recover and re-register with the
> master using its old SlaveID.
> Would like to hear from the folks here if you see any operational concerns
> with letting the agents recover post a host reboot.
>
> MESOS JIRA: https://issues.apache.org/jira/browse/MESOS-6223
>
> Many Thanks
> Megha Sharma
>
>
>

Mime
View raw message