mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Bannier <benjamin.bann...@mesosphere.io>
Subject Re: Review Request 65482: Fixed allocator bookkeeping of pending operations on master failover.
Date Mon, 05 Feb 2018 19:42:57 GMT


> On Feb. 3, 2018, 12:06 a.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 7594-7597 (original), 7609-7612 (patched)
> > <https://reviews.apache.org/r/65482/diff/1/?file=1952241#file1952241line7609>
> >
> >     Is this function now only called with resources from already-existing frameworks?
Does that mean that we can update the code here? https://github.com/apache/mesos/blob/5a1433576eca20f15f1ea309fc202f4bbaf3b6c7/src/master/allocator/mesos/hierarchical.cpp#L690-L703

I am not sure we can make sure this will always happen in general. As a precaution I think
it is still a good idea to leave this check in the allocator, if only to document allocator-local
invariants.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65482/#review196752
-----------------------------------------------------------


On Feb. 2, 2018, 3:07 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65482/
> -----------------------------------------------------------
> 
> (Updated Feb. 2, 2018, 3:07 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8536
>     https://issues.apache.org/jira/browse/MESOS-8536
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch fixes a bug where pending operations on a resource provider
> resources where not properly accounted for in the allocator. This lead
> to assertion failures when the operation became terminal and we
> attempted to recover the used resources.
> 
> Since framework information is only remembered on agents if the
> framework launched a task, there exists the possibility that a master
> learns about an allocation to a framework unknown to it, yet. To
> accommodate that do not bookkeep allocations to unknown frameworks in
> the allocator and update code handling of terminal operation updates
> accordingly.
> 
> 
> Diffs
> -----
> 
>   src/master/allocator/mesos/hierarchical.cpp f33ff767dcb93556beb696c96f8cfc17baccb05e

>   src/master/master.cpp cc2685a6bc14103c639ce776cf1c912361e93381 
> 
> 
> Diff: https://reviews.apache.org/r/65482/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`, also tested with a version of the test added in r/65045 which triggered
this issue.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message