mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Yu <yujie....@gmail.com>
Subject Re: Review Request 56195: Fixed ContainerLogger / IOSwitchboard FD leaks.
Date Mon, 27 Feb 2017 17:36:03 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56195/#review166829
-----------------------------------------------------------




include/mesos/slave/container_logger.hpp (lines 104 - 106)
<https://reviews.apache.org/r/56195/#comment238904>

    I don't think we need those anymore?



include/mesos/slave/container_logger.hpp (line 121)
<https://reviews.apache.org/r/56195/#comment238925>

    This will probably break windows build?



src/slave/containerizer/mesos/containerizer.cpp (line 1252)
<https://reviews.apache.org/r/56195/#comment238905>

    s/containerIo/containerIO/
    
    so that it's the same as that in the header.



src/slave/containerizer/mesos/containerizer.cpp (line 2089)
<https://reviews.apache.org/r/56195/#comment238983>

    This is a bit weird here, especially the name of the function suggests that it is a const
function, but in fact it has side effects.
    
    I'd suggest we simply rely on isolator cleanup to cleanup the reference to the ContainerIO
struct.



src/slave/containerizer/mesos/io/switchboard.hpp (lines 56 - 57)
<https://reviews.apache.org/r/56195/#comment238927>

    I would suggest we rename ContainerLogger::SubprocessInfo to ContainerIO. Also see my
comments below.



src/slave/containerizer/mesos/io/switchboard.cpp (lines 310 - 314)
<https://reviews.apache.org/r/56195/#comment238928>

    If we only have `ContainerIO`, then the logic here can just be using the default copy
constructor.
    
    Also, I'd suggest we don't use 'Owned' pointer here since ContainerIO is copyable.



src/slave/containerizer/mesos/io/switchboard.cpp (lines 311 - 313)
<https://reviews.apache.org/r/56195/#comment238926>

    The `std::move` here is not necessary. We haven't defined move constructor for Subprocess::IO.



src/slave/containerizer/mesos/io/switchboard.cpp (line 626)
<https://reviews.apache.org/r/56195/#comment238980>

    I'd suggest we try to remove this in `cleanup` as a safetynet.
    
    Also, as I mentioned above, let's cleanup the reference in cleanup so that we don't need
to perform the extra step in containerizer destroy path.


- Jie Yu


On Feb. 22, 2017, 7:50 p.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56195/
> -----------------------------------------------------------
> 
> (Updated Feb. 22, 2017, 7:50 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gastón Kleiman, Gilbert Song, Jie Yu,
Joseph Wu, and Vinod Kone.
> 
> 
> Bugs: MESOS-7050
>     https://issues.apache.org/jira/browse/MESOS-7050
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the containizer launch path would leak FDs if the
> containerizer launch path failed between successfully calling
> prepare() on either the ContainerLogger (in the case of the Docker
> containerizer) or the IOSwitchboard (in the case of the mesos
> containerizer) and forking the actual container.
> 
> These components relied on the Subprocess call inside launcher->fork()
> to close these FDS on their behalf. If the containerizer launch path
> failed somewhere between calling prepare() and making this fork()
> call, these FDs would never be closed.
> 
> In the case of the IOSwitchboard, this would lead to deadlock in the
> destroy path because the future returned by the IOSwitchboard's
> cleanup function would never be satisfied. The IOSwitchboard doesn't
> shutdown until the FDs it allocates to the container have been closed.
> 
> This commit fixes this problem by updating the
> ContainerLogger::SubprocessInfo::FD abstraction to change the way
> manages FDS. Instead of tagging each FD with the Subprocess::IO::OWNED
> label and forcing the launcher->fork() call to deal with closing the
> FDs once it's forked a new subprocess, we now do things slightly
> differently now.
> 
> We now keep the default DUP label on each FD (instead fo changing it
> to OWNED) to cause launcher->fork() to dup it before mapping it onto
> the stdin/stdout/stderr of the subprocess. It then only closes the
> duped FD, leaving the original one open.
> 
> In doing so, it s now the contaienrizer's responsibility to ensure
> that these FDs are closed properly (whether that's between a
> successful prepare() call and launcher->fork()) or after
> launcher->fork() has completed successfully. While this has the
> potential to complicate things slightly on the SUCCESS path,
> at least it is now the containerizers's responsibility to close these
> FDS in *all* cases, rather than splitting that responsibility across
> components.
> 
> In order to simplify this, we've also modified the
> ContainerLogger::SubprocessInfo::FD abstraction to hold a Shared
> pointer to its underlying file descriptor and (optionally) close it on
> destruction. With this, we can ensure that all file descriptors
> created through this abstraction will be automatically closed onced
> their final reference goes out of scope (even if its been copied
> around several times).
> 
> In essence, this release the containerizer from the burden of manually
> closing these FDS itself. So long as it holds the final reference to
> these FDs (which it does), they will be automatically closed along
> *any* path out of containerizer->launch(). These are exactly the
> semantics we wanted to achieve.
> 
> In the case of the the ContainerLogger, ownership of these FDs (and
> thus their final reference) is passed to the containerizer in the
> SubprocessInfo struct returned by prepare(). In the case of the
> IOSwitchboard, we had to add a new API call to transfer ownership
> (since it is an isolator and prepare() can only return a protobuf),
> but the end result is the same.
> 
> 
> Diffs
> -----
> 
>   include/mesos/slave/container_logger.hpp a3f619b79ca0188df9e231c600dfa396f39ab29a 
>   include/mesos/slave/containerizer.proto c70d437ac3f8f813fcb71c04275cc8eeeb946065 
>   src/slave/containerizer/mesos/containerizer.hpp 10a9b57660388ac2409458a4d07af64cc3b185e2

>   src/slave/containerizer/mesos/containerizer.cpp d2b4f75a55dbe4746bc2dfc180335fa831a554ef

>   src/slave/containerizer/mesos/io/switchboard.hpp 7a7ad2da047ef316f4382e45526d503c87a903a0

>   src/slave/containerizer/mesos/io/switchboard.cpp b948f3c44dd2e1af2228ca1579f24ae3a94160b0

> 
> Diff: https://reviews.apache.org/r/56195/diff/
> 
> 
> Testing
> -------
> 
> Linux CentOS 7:
> ```
> GTEST_FILTER="" make -j check
> sudo src/mesos-tests
> ```
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message