mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Rukletsov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-9116) Launch nested container session fails due to incorrect detection of `mnt` namespace of command executor's task.
Date Thu, 06 Sep 2018 11:45:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16586014#comment-16586014
] 

Alexander Rukletsov edited comment on MESOS-9116 at 9/6/18 11:44 AM:
---------------------------------------------------------------------

{noformat}
commit d95a16e03d27a2b6575148183e53a3b4507a16c1
Author:     Andrei Budnik <abudnik@mesosphere.com>
AuthorDate: Mon Aug 20 16:22:33 2018 +0200
Commit:     Alexander Rukletsov <alexr@apache.org>
CommitDate: Mon Aug 20 16:22:33 2018 +0200

    Added `LaunchNestedContainerSessionInParallel` test.
    
    This patch adds a test which verifies that launching multiple
    short-lived nested container sessions succeeds. This test
    implicitly verifies that agent correctly detects `mnt` namespace
    of a command executor's task. If the detection fails, the
    containerizer launcher (aka `nanny`) process fails to enter `mnt`
    namespace, so it prints an error message into stderr for this
    nested container.
    
    This test is disabled until we fix MESOS-8545.
    
    Review: https://reviews.apache.org/r/68256/
{noformat}
{noformat}
commit e78f636d84f2709da17275f7d70265520c0f4f94
Author:     Andrei Budnik <abudnik@mesosphere.com>
AuthorDate: Mon Aug 20 16:28:31 2018 +0200
Commit:     Alexander Rukletsov <alexr@apache.org>
CommitDate: Mon Aug 20 16:28:31 2018 +0200

    Fixed incorrect `mnt` namespace detection of command executor's task.
    
    Previously, we were walking the process tree from the container's
    `init` process to find the first process along the way whose `mnt`
    namespace differs from the `init` process. We expected this algorithm
    to always return the PID of the command executor's task.
    
    However, if someone launches multiple nested containers within the
    process tree, the aforementioned algorithm might detect the PID of
    one of those nested container instead of the command executor's task.
    Even though the `mnt` namespace will be the same across all these
    candidates, the detected PID might belong to a short-lived container,
    which might terminate before the containerizer launcher (aka `nanny`
    process) tries to enter its `mnt` namespace.
    
    This patch fixes the detection algorithm so that it always returns
    the PID of the command executor's task.
    
    Review: https://reviews.apache.org/r/68257/
{noformat}
{noformat}
commit 31499a5dc1de29fa2178e6ea9e5398d8c668a933
Author:     Andrei Budnik <abudnik@mesosphere.com>
AuthorDate: Mon Aug 20 16:28:38 2018 +0200
Commit:     Alexander Rukletsov <alexr@apache.org>
CommitDate: Mon Aug 20 16:28:38 2018 +0200

    Added `ROOT_CGROUPS_LaunchNestedDebugAfterUnshareMntNamespace` test.
    
    This test verifies detection of task's `mnt` namespace for a debug
    nested container. Debug nested container must enter `mnt` namespace
    of the task, so the agent tries to detect task's `mnt` namespace.
    This test launches a long-running task which runs a subtask that
    unshares `mnt` namespace. The structure of the resulting process tree
    is similar to the process tree of the command executor (the task of
    the command executor unshares `mnt` ns):
    
      0. root (aka "nanny"/"launcher" process) [root `mnt` namespace]
        1. task: sleep 1000 [root `mnt` namespace]
          2. subtaks: sleep 1000 [subtask's `mnt` namespace]
    
    We expect that the agent detects task's `mnt` namespace.
    
    Review: https://reviews.apache.org/r/68408/
{noformat}
{noformat}
commit b3c9c6939964831170e819f88134af7b275ffe1b
Author:     Andrei Budnik <abudnik@mesosphere.com>
AuthorDate: Mon Aug 20 16:28:44 2018 +0200
Commit:     Alexander Rukletsov <alexr@apache.org>
CommitDate: Mon Aug 20 16:28:44 2018 +0200

    Fixed wrong `mnt` namespace detection for non-command executor tasks.
    
    Previously, we were calling `getMountNamespaceTarget()` not only in
    case of the command executor but in all other cases too, including
    the default executor. That might lead to various subtle bugs, caused by
    wrong detection of `mnt` namespace target. This patch fixes the issue
    by setting a parent PID as `mnt` namespace target in case of
    non-command executor task.
    
    Review: https://reviews.apache.org/r/68348/
{noformat}
{noformat}
commit 52be35f47caea2712a0b13d7f963f7236533a2f1
Author:     Andrei Budnik <abudnik@mesosphere.com>
AuthorDate: Thu Sep 6 13:41:06 2018 +0200
Commit:     Alexander Rukletsov <alexr@apache.org>
CommitDate: Thu Sep 6 13:41:06 2018 +0200

    Fixed `LaunchNestedContainerSessionsInParallel` test.
    
    Previously, we sent `ATTACH_CONTAINER_OUTPUT` to attach to a
    short-living nested container. An attempt to attach to a terminated
    nested container leads to HTTP 500 error. This patch gets rid of
    `ATTACH_CONTAINER_OUTPUT` in favor of `LAUNCH_NESTED_CONTAINER_SESSION`
    so that we can read the container's output without using an extra call.
    
    Review: https://reviews.apache.org/r/68236/
{noformat}


was (Author: alexr):
{noformat}
commit d95a16e03d27a2b6575148183e53a3b4507a16c1
Author:     Andrei Budnik <abudnik@mesosphere.com>
AuthorDate: Mon Aug 20 16:22:33 2018 +0200
Commit:     Alexander Rukletsov <alexr@apache.org>
CommitDate: Mon Aug 20 16:22:33 2018 +0200

    Added `LaunchNestedContainerSessionInParallel` test.
    
    This patch adds a test which verifies that launching multiple
    short-lived nested container sessions succeeds. This test
    implicitly verifies that agent correctly detects `mnt` namespace
    of a command executor's task. If the detection fails, the
    containerizer launcher (aka `nanny`) process fails to enter `mnt`
    namespace, so it prints an error message into stderr for this
    nested container.
    
    This test is disabled until we fix MESOS-8545.
    
    Review: https://reviews.apache.org/r/68256/
{noformat}
{noformat}
commit e78f636d84f2709da17275f7d70265520c0f4f94
Author:     Andrei Budnik <abudnik@mesosphere.com>
AuthorDate: Mon Aug 20 16:28:31 2018 +0200
Commit:     Alexander Rukletsov <alexr@apache.org>
CommitDate: Mon Aug 20 16:28:31 2018 +0200

    Fixed incorrect `mnt` namespace detection of command executor's task.
    
    Previously, we were walking the process tree from the container's
    `init` process to find the first process along the way whose `mnt`
    namespace differs from the `init` process. We expected this algorithm
    to always return the PID of the command executor's task.
    
    However, if someone launches multiple nested containers within the
    process tree, the aforementioned algorithm might detect the PID of
    one of those nested container instead of the command executor's task.
    Even though the `mnt` namespace will be the same across all these
    candidates, the detected PID might belong to a short-lived container,
    which might terminate before the containerizer launcher (aka `nanny`
    process) tries to enter its `mnt` namespace.
    
    This patch fixes the detection algorithm so that it always returns
    the PID of the command executor's task.
    
    Review: https://reviews.apache.org/r/68257/
{noformat}
{noformat}
commit 31499a5dc1de29fa2178e6ea9e5398d8c668a933
Author:     Andrei Budnik <abudnik@mesosphere.com>
AuthorDate: Mon Aug 20 16:28:38 2018 +0200
Commit:     Alexander Rukletsov <alexr@apache.org>
CommitDate: Mon Aug 20 16:28:38 2018 +0200

    Added `ROOT_CGROUPS_LaunchNestedDebugAfterUnshareMntNamespace` test.
    
    This test verifies detection of task's `mnt` namespace for a debug
    nested container. Debug nested container must enter `mnt` namespace
    of the task, so the agent tries to detect task's `mnt` namespace.
    This test launches a long-running task which runs a subtask that
    unshares `mnt` namespace. The structure of the resulting process tree
    is similar to the process tree of the command executor (the task of
    the command executor unshares `mnt` ns):
    
      0. root (aka "nanny"/"launcher" process) [root `mnt` namespace]
        1. task: sleep 1000 [root `mnt` namespace]
          2. subtaks: sleep 1000 [subtask's `mnt` namespace]
    
    We expect that the agent detects task's `mnt` namespace.
    
    Review: https://reviews.apache.org/r/68408/
{noformat}
{noformat}
commit b3c9c6939964831170e819f88134af7b275ffe1b
Author:     Andrei Budnik <abudnik@mesosphere.com>
AuthorDate: Mon Aug 20 16:28:44 2018 +0200
Commit:     Alexander Rukletsov <alexr@apache.org>
CommitDate: Mon Aug 20 16:28:44 2018 +0200

    Fixed wrong `mnt` namespace detection for non-command executor tasks.
    
    Previously, we were calling `getMountNamespaceTarget()` not only in
    case of the command executor but in all other cases too, including
    the default executor. That might lead to various subtle bugs, caused by
    wrong detection of `mnt` namespace target. This patch fixes the issue
    by setting a parent PID as `mnt` namespace target in case of
    non-command executor task.
    
    Review: https://reviews.apache.org/r/68348/
{noformat}

> Launch nested container session fails due to incorrect detection of `mnt` namespace of
command executor's task.
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-9116
>                 URL: https://issues.apache.org/jira/browse/MESOS-9116
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent, containerization
>    Affects Versions: 1.4.2, 1.5.1, 1.6.1, 1.7.0
>            Reporter: Andrei Budnik
>            Assignee: Andrei Budnik
>            Priority: Critical
>              Labels: mesosphere
>             Fix For: 1.5.2, 1.6.2, 1.7.0
>
>         Attachments: pstree.png
>
>
> Launch nested container call might fail with the following error:
> {code:java}
> Failed to enter mount namespace: Failed to open '/proc/29473/ns/mnt': No such file or
directory
> {code}
> This happens when the containerizer launcher [tries to enter|https://github.com/apache/mesos/blob/077f122d52671412a2ab5d992d535712cc154002/src/slave/containerizer/mesos/launch.cpp#L879-L892]
`mnt` namespace using the pid of a terminated process. The pid [was detected|https://github.com/apache/mesos/blob/077f122d52671412a2ab5d992d535712cc154002/src/slave/containerizer/mesos/containerizer.cpp#L1930-L1958]
by the agent before spawning the containerizer launcher process, because the process was
running back then.
> The issue can be reproduced using the following test (pseudocode):
> {code:java}
> launchTask("sleep 1000")
> parentContainerId = containerizer.containers().begin()
> outputs = []
> for i in range(10):
>   ContainerId containerId
>   containerId.parent = parentContainerId
>   containerId.id = UUID.random()
>   LAUNCH_NESTED_CONTAINER_SESSION(containerId, "echo echo")
>   response = ATTACH_CONTAINER_OUTPUT(containerId)
>   outputs.append(response.reader)
> for output in outputs:
>   stdout, stderr = getProcessIOData(output)
>   assert("echo" == stdout + stderr){code}
> When we start the very first nested container, `getMountNamespaceTarget()` returns a
PID of the task (`sleep 1000`), because it's the only process whose `mnt` namespace differs
from the parent container. This nested container becomes a child of PID 1 process, which is
also a parent of the command executor. It's not an executor's child! It can be seen in attached
`pstree.png`.
> When we start a second nested container, `getMountNamespaceTarget()` might return PID
of the previous nested container (`echo echo`) instead of the task's PID (`sleep 1000`).
It happens because the first nested container entered `mnt` namespace of the task. Then, the
containerizer launcher ("nanny" process) attempts to enter `mnt` namespace using the PID of
a terminated process, so we get this error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message