mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Yu <yujie....@gmail.com>
Subject Re: mesos git commit: Fixed bug allowing IOSwitchboard::connect() after container destruction.
Date Tue, 24 Jan 2017 23:03:59 GMT
If 'cleanup' is invoked twice, that might cause signal being sent twice?

- Jie

On Tue, Jan 24, 2017 at 2:58 PM, <vinodkone@apache.org> wrote:

> Repository: mesos
> Updated Branches:
>   refs/heads/master 26923a1de -> 7f83b6e34
>
>
> Fixed bug allowing IOSwitchboard::connect() after container destruction.
>
> Review: https://reviews.apache.org/r/55810/
>
>
> Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
> Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/7f83b6e3
> Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/7f83b6e3
> Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/7f83b6e3
>
> Branch: refs/heads/master
> Commit: 7f83b6e34a2adef79a427e54f7a2dd4d5afbc1f5
> Parents: 26923a1
> Author: Kevin Klues <klueska@gmail.com>
> Authored: Tue Jan 24 14:58:33 2017 -0800
> Committer: Vinod Kone <vinodkone@gmail.com>
> Committed: Tue Jan 24 14:58:33 2017 -0800
>
> ----------------------------------------------------------------------
>  src/slave/containerizer/mesos/io/switchboard.cpp | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> ----------------------------------------------------------------------
>
>
> http://git-wip-us.apache.org/repos/asf/mesos/blob/7f83b6e3/
> src/slave/containerizer/mesos/io/switchboard.cpp
> ----------------------------------------------------------------------
> diff --git a/src/slave/containerizer/mesos/io/switchboard.cpp
> b/src/slave/containerizer/mesos/io/switchboard.cpp
> index 4b134f8..1b8f490 100644
> --- a/src/slave/containerizer/mesos/io/switchboard.cpp
> +++ b/src/slave/containerizer/mesos/io/switchboard.cpp
> @@ -695,7 +695,7 @@ Future<http::Connection> IOSwitchboard::connect(
>        })
>      .then(defer(self(), [=]() -> Future<http::Connection> {
>        if (!infos.contains(containerId)) {
> -        return Failure("Container has or is being destroyed");
> +        return Failure("I/O switchboard has shutdown");
>        }
>
>        // TODO(jieyu): We might still get a connection refused error
> @@ -758,8 +758,6 @@ Future<Nothing> IOSwitchboard::cleanup(
>    Option<pid_t> pid = infos[containerId]->pid;
>    Future<Option<int>> status = infos[containerId]->status;
>
> -  infos.erase(containerId);
> -
>    // If we have a pid, then we attempt to send it a SIGTERM to have it
>    // shutdown gracefully. This is best effort, as it's likely that the
>    // switchboard has already shutdown in the common case.
> @@ -794,6 +792,19 @@ Future<Nothing> IOSwitchboard::cleanup(
>    // DISCARDED cases as well.
>    return await(list<Future<Option<int>>>{status}).then(
>        defer(self(), [this, containerId]() -> Future<Nothing> {
> +        // We only remove the 'containerId from our info struct once
> +        // we are sure that the I/O switchboard has shutdown. If we
> +        // removed it any earlier, attempts to connect to the I/O
> +        // switchboard would fail.
> +        //
> +        // NOTE: One caveat of this approach is that this lambda will
> +        // be invoked multiple times if `cleanup()` is called multiple
> +        // times before the first instance of it is triggered. This is
> +        // OK for now because the logic below has no side effects. If
> +        // the logic below gets more complicated, we may need to
> +        // revisit this approach.
> +        infos.erase(containerId);
> +
>          // Best effort removal of the unix domain socket file created for
>          // this container's `IOSwitchboardServer`. If it hasn't been
>          // checkpointed yet, or the socket file itself hasn't been
> created,
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message