mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Weindel" <>
Subject Re: Review Request 25035: Fix for MESOS-1688
Date Sat, 30 Aug 2014 18:34:03 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Aug. 30, 2014, 6:34 nachm.)

Review request for mesos.


uploaded same diff once again

Bugs: MESOS-1688

Repository: mesos-git


As already explained in JIRA MESOS-1688, there are schedulers allocating memory only for the
executor and not for tasks. For tasks only CPU resources are allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers forever. Dead
lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable without considering
allocatable memory

Diffs (updated)

  src/master/hierarchical_allocator_process.hpp 34f8cd658920b36b1062bd3b7f6bfbd1bcb6bb52 



Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested running multiple
parallel Spark jobs in "fine-grained" mode to saturate allocatable memory. The jobs run fine
now. This load always caused a dead lock in all Spark jobs within one minute with the unpatched


Martin Weindel

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message