mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dominic Hamon" <dha...@twopensource.com>
Subject Re: Review Request 20981: Updated the Registrar to abort permanently upon encountering a Failure.
Date Fri, 02 May 2014 01:16:56 GMT


> On May 1, 2014, 3:01 p.m., Dominic Hamon wrote:
> > src/master/registrar.cpp, line 333
> > <https://reviews.apache.org/r/20981/diff/1/?file=573051#file573051line333>
> >
> >     worth adding a gauge for this size?
> 
> Ben Mahler wrote:
>     I think so, was planning to take that up in a separate review. :)

+1


> On May 1, 2014, 3:01 p.m., Dominic Hamon wrote:
> > src/master/registrar.cpp, line 451
> > <https://reviews.apache.org/r/20981/diff/1/?file=573051#file573051line451>
> >
> >     can you use CHECK_READY here instead? it's almost duplicating the logic though
it looks like you have some extra cases.
> 
> Ben Mahler wrote:
>     Hm, I don't understand this comment, can you clarify? Maybe provide a code snippet?

CHECK_READY on store will abort with a reasonable message depending on the Future's state.
While you have some extra conditions that would need to be added in a separate check (CHECK_SOME
on store.get()?) I don't think my comment applies anyway as 'abort' in this context is actually
calling LOG(ERROR) and fail. I expected 'abort' in this context to be calling, well, abort!

So ignore my comment about using CHECK_READY, but maybe consider a different name for abort?


- Dominic


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20981/#review41962
-----------------------------------------------------------


On May 1, 2014, 6:09 p.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20981/
> -----------------------------------------------------------
> 
> (Updated May 1, 2014, 6:09 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Bugs: MESOS-1274
>     https://issues.apache.org/jira/browse/MESOS-1274
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> It's possible for a backed-up master (many items in its queue) to have many operations
enqueued in the Registrar.
> 
> In this event, the Master won't commit suicide until the initial failure is processed.
However, in the interim, subsequent operations are potentially being performed against the
Registrar. This could lead to fighting between Masters if a "demoted" Master re-attempts to
acquire log-leadership! This scenario can occur if the "demoted" master has a large queue
and the demotion event is towards the back of the Master's queue.
> 
> It would be preferable to ensure that after losing log leadership, the "demoted" master
does not try to re-acquire log leadership and write to the log.
> 
> This is the motivation for this patch.
> 
> 
> Diffs
> -----
> 
>   src/master/registrar.cpp fecc314df04c552212522168f7a5a17b77482e34 
>   src/tests/registrar_tests.cpp 917a470f326523fbf11e245f4156fc8ce1d974d5 
> 
> Diff: https://reviews.apache.org/r/20981/diff/
> 
> 
> Testing
> -------
> 
> Added a test and improved the existing tests.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message