incubator-mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " (Commented) (JIRA)" <>
Subject [jira] [Commented] (MESOS-110) Mesos deploys should not restart tasks
Date Tue, 03 Apr 2012 20:58:31 GMT

] commented on MESOS-110:

This is an automatically generated e-mail. To reply, visit:

(Updated 2012-04-03 20:58:17.982746)

Review request for mesos, Benjamin Hindman and John Sirois.


merged with trunk


Sorry for the huge  CL!

Slave restarts now supports recovery!
--> Non-disruptive restart means running tasks are not lost
--> Re-connects with live executors
--> Checkpoints and reliably sends status updates
--> Ability to kill executors if the slave upgrade is incompatible with running executors

This addresses bug mesos-110.

Diffs (updated)

  src/ d5edaa2 
  src/common/hashset.hpp 1feb610 
  src/common/utils.hpp 1d81e21 
  src/exec/exec.cpp e8db407 
  src/launcher/launcher.cpp a141b9a 
  src/local/local.hpp 55f9eaf 
  src/local/local.cpp affe432 
  src/master/master.cpp 4dc9ee0 
  src/messages/messages.proto 87e1548 
  src/sched/sched.cpp dcadb10 
  src/slave/constants.hpp f0c8679 
  src/slave/isolation_module.hpp c896908 
  src/slave/lxc_isolation_module.hpp b7beefe 
  src/slave/lxc_isolation_module.cpp 66a2a89 
  src/slave/main.cpp 85cba25 
  src/slave/process_based_isolation_module.hpp f6f9554 
  src/slave/process_based_isolation_module.cpp 2b37d42 
  src/slave/slave.hpp 279bc7b 
  src/slave/slave.cpp 3358ec4 
  src/tests/fault_tolerance_tests.cpp 6772daf 
  src/tests/slave_restart_tests.cpp PRE-CREATION 
  src/tests/utils.hpp e81ec82 



make check.

Note that only the new test in tests/slave_restart_tests.cpp  engages in recovery!

Recovery is disabled for old tests (though they still checkpoint relevant info!)



> Mesos deploys should not restart tasks
> --------------------------------------
>                 Key: MESOS-110
>                 URL:
>             Project: Mesos
>          Issue Type: Improvement
>          Components: framework
>            Reporter: Rob Benson
>            Assignee: Vinod Kone
> Running a long-lived service on Mesos has a significant drawback right now in that Mesos
build deploys restart your tasks. This could lead to nontrivial outages for services that
have a high warm-up time.  Basically everything would need a graceful restart mechanism that
basically allows a shutdown/restart with a new version of the code. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message