mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Weindel" <martin.wein...@gmail.com>
Subject Re: Review Request 25035: Fix for MESOS-1688
Date Tue, 16 Sep 2014 21:14:08 GMT


> On Sept. 15, 2014, 3:23 nachm., Timothy St. Clair wrote:
> > src/master/hierarchical_allocator_process.hpp, line 837
> > <https://reviews.apache.org/r/25035/diff/7/?file=688721#file688721line837>
> >
> >     What happens in the case where all CPUs are taken but memory is available? 
It looks like it will return (true), but this should not be possible. 
> >     
> >     I think you want to give an offer in the case where there are CPU resources
available, but memory is consumed by the executor.
> 
> Vinod Kone wrote:
>     Giving memory only resources is ok as long as it is used for a task and not an executor.
See my comments above.
> 
> Timothy St. Clair wrote:
>     Could you please add a detailed comment in the code above the mod, as on 1st inspection
it leaves me still feeling unsettled.

I agree with Vinod. An executor may make use of additional offered memory, e.g for expanding
a cache.
In this scenario, the already allocated CPU resources are sufficient.


- Martin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/#review53343
-----------------------------------------------------------


On Sept. 16, 2014, 9:05 nachm., Martin Weindel wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25035/
> -----------------------------------------------------------
> 
> (Updated Sept. 16, 2014, 9:05 nachm.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-1688
>     https://issues.apache.org/jira/browse/MESOS-1688
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> As already explained in JIRA MESOS-1688, there are schedulers allocating memory only
for the executor and not for tasks. For tasks only CPU resources are allocated in this case.
> Such a scheduler does not get offered any idle CPUs if the slave has nearly used up all
memory.
> This can easily lead to a dead lock (in the application, not in Mesos).
> 
> Simple example:
> 1. Scheduler allocates all memory of a slave for an executor
> 2. Scheduler launches a task for this executor (allocating 1 CPU)
> 3. Task finishes: 1 CPU , 0 MB memory allocatable.
> 4. No offers are made, as no memory is left. Scheduler will wait for offers forever.
Dead lock in the application.
> 
> To fix this problem, offers must be made if CPU resources are allocatable without considering
allocatable memory
> 
> 
> Diffs
> -----
> 
>   CHANGELOG a822cc4 
>   src/common/resources.cpp edf36b1 
>   src/master/constants.cpp faa1503 
>   src/master/hierarchical_allocator_process.hpp 34f8cd6 
>   src/master/master.cpp 18464ba 
>   src/tests/allocator_tests.cpp 774528a 
> 
> Diff: https://reviews.apache.org/r/25035/diff/
> 
> 
> Testing
> -------
> 
> Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested running multiple
parallel Spark jobs in "fine-grained" mode to saturate allocatable memory. The jobs run fine
now. This load always caused a dead lock in all Spark jobs within one minute with the unpatched
Mesos.
> 
> 
> Thanks,
> 
> Martin Weindel
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message