mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Mahler" <benjamin.mah...@gmail.com>
Subject Re: Review Request 22796: Add timeout to rescind unused offers
Date Mon, 28 Jul 2014 21:11:31 GMT


> On July 24, 2014, 6:27 p.m., Ben Mahler wrote:
> > src/master/master.cpp, lines 3451-3457
> > <https://reviews.apache.org/r/22796/diff/7/?file=634636#file634636line3451>
> >
> >     I forgot to mention the bug here in my comment!
> >     
> >     With using an offerTimeout function, you can properly get the resources back
from the allocator.
> >     
> >     This current patch removes the offer but doesn't tell the allocator!
> 
> Ben Mahler wrote:
>     Ideally we could capture the allocator expectations in the test, which would have
caught this issue.
> 
> Timothy Chen wrote:
>     Not sure I understand, I thought removeOffer call already handles rescinding offers
which also gives back allocated resources to the slave and framework?
>     This patch simply adds a timeout to call rescind when it's not claimed?

Take a look at other calls to removeOffer, this one is in the same vein as this review: https://github.com/apache/mesos/blob/0.19.1/src/master/master.cpp#L1857

This trickiness was the motivation for: https://issues.apache.org/jira/browse/MESOS-1452

Let's improve the test here! We should expect that after the timeout, the scheduler receives
another offer for the same resources, that will not happen with the current diff.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22796/#review48670
-----------------------------------------------------------


On July 28, 2014, 8:34 p.m., Timothy Chen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22796/
> -----------------------------------------------------------
> 
> (Updated July 28, 2014, 8:34 p.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, and Niklas Nielsen.
> 
> 
> Bugs: MESOS-186
>     https://issues.apache.org/jira/browse/MESOS-186
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Based on Kapil's patch (https://reviews.apache.org/r/22066/), adding timeout for each
offer from master to remove the offer when it's no longer used.
> 
> 
> Diffs
> -----
> 
>   src/master/flags.hpp 32704ce 
>   src/master/master.hpp d8a4d9e 
>   src/master/master.cpp 273a516 
>   src/tests/master_tests.cpp 5a1cf7f 
> 
> Diff: https://reviews.apache.org/r/22796/diff/
> 
> 
> Testing
> -------
> 
> Added three more unit tests from Kapil's patch: Testing offer not rescinded after task
launched, offer not rescinded when framework/slave unregistered.
> The test exposed a race condition that can lead to a segfault if two remove offers are
called on the same offer.
> 
> make check.
> 
> 
> Thanks,
> 
> Timothy Chen
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message