mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niklas Nielsen" <...@qni.dk>
Subject Re: Review Request 20221: Changed executor state recovery to allow run recovery in absence of executor info.
Date Mon, 14 Apr 2014 18:22:38 GMT


> On April 14, 2014, 10:04 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, lines 3116-3118
> > <https://reviews.apache.org/r/20221/diff/2/?file=554602#file554602line3116>
> >
> >     I think what this is saying is:
> >     
> >     If we have a valid run (determined in the codce above) then we're sure to have
a checkpointed ExecutorInfo because the ExecutorInfo is checkpointed before we checkpoint
any information about a run.
> >     
> >     But is it possible that a run is valid but for whatever reason recovering the
ExecutorInfo fails? For example, because the file got corrupted, or by accidentally deleted?

If the executor info file gets corrupted or deleted, the check would fail.

How about extending the test on entry (that ensures presence of runs and gracefully GC's and
abort recovery?) with ... || state.info.isNone() ?
The test will be removed in the task info patch anyway as we deal with the missing executor
info explicitly there.


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20221/#review40269
-----------------------------------------------------------


On April 10, 2014, 1:26 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20221/
> -----------------------------------------------------------
> 
> (Updated April 10, 2014, 1:26 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This patch let executor recovery recover runs in the absence of
> executor info.  This is needed as new task-info patch will introduce
> an intermediate state where the executor info hasn't been check
> pointed. In this interim, the slave may fail-over and should be in a
> position to clean up orphan containers (as for now, the containerizer
> API doesn't provide a way to reconcile the executor info and it is
> therefore not possible to recover the containers in this case).
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp cddb241 
>   src/slave/state.cpp 21d1fb7 
> 
> Diff: https://reviews.apache.org/r/20221/diff/
> 
> 
> Testing
> -------
> 
> make check and tested with task-info patch and new launch test.
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message