mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Klues <klue...@gmail.com>
Subject Re: Review Request 54355: Added implementation of `recover()` to the IOSwitchboard isolator.
Date Mon, 05 Dec 2016 23:16:47 GMT


> On Dec. 5, 2016, 9:11 p.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/io/switchboard.cpp, lines 537-547
> > <https://reviews.apache.org/r/54355/diff/3/?file=1576209#file1576209line537>
> >
> >     It's likely that the io switchboard server has been forked, but the agent crashes
before it was able to checkpoint the pid.
> >     
> >     If that happens, during recovery, we will not maintain Info for that container.
As a result, we won't try to cleanup the socket file potentially created?
> >     
> >     I think we probably need to createa directory for io switchboard related files
(sock and pid files). When we create the directory, it indicates that the io switchboard server
might or might not be created. During recovery, if we find the directory exists, but pid file
does not exist, we should still create the Info with pid set to None(), and cleanup the socket
file in 'cleanup' method.
> >     
> >     Thoughts?
> 
> Kevin Klues wrote:
>     That seems reasonable, what should we call the directory? As below?
>     ```
>     io_switchboard
>     |-pid
>     -socket
>     ```

That said, note that the io-switchboard itself creates the sock file, so the only time this
would ever happen is if an agent restarted between successfully launching the io-switchboard
and checkpointing its pid.


- Kevin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54355/#review158038
-----------------------------------------------------------


On Dec. 5, 2016, 9:46 a.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54355/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2016, 9:46 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-6688
>     https://issues.apache.org/jira/browse/MESOS-6688
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added implementation of `recover()` to the IOSwitchboard isolator.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/io/switchboard.hpp 839665a22aca9b1c1c1cf4992406bc924ee2b065

>   src/slave/containerizer/mesos/io/switchboard.cpp d5211b98616e72a27ca6b472a5ee83505c227f22

> 
> Diff: https://reviews.apache.org/r/54355/diff/
> 
> 
> Testing
> -------
> 
> GTEST_FILTER="" make -j check
> sudo src/mesos-tests
> 
> Test added in follow-on patch.
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message