mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qian Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-9334) Container stuck at ISOLATING state due to libevent poll never returns
Date Wed, 24 Oct 2018 09:18:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661975#comment-16661975
] 

Qian Zhang edited comment on MESOS-9334 at 10/24/18 9:17 AM:
-------------------------------------------------------------

After reading some libevent code and our code to call libevent, I think the root cause of
this issue is, after we call libevent to poll an fd, that fd is disabled inside libevent due
to a race. Here is the flow:
 # Container1 is launched and cgroups memory subsystem calls the function `cgroups::memory::oom::listen()`
to listen OOM event for this container, and that function will internally open an fd, call
libevent to poll it, and return a future to cgroups memory subsystem.
 # Container1 exits and when we destroy it, the cleanup method of cgroups memory subsystem
will discard the future got in #1. As the result, `Listener::finalize()` will be called (see
[this code|https://github.com/apache/mesos/blob/1.7.0/src/linux/cgroups.cpp#L1069:L1087] for
details), and it will
 ** Discard the future returned by libevent poll which will cause `pollDiscard()` called
and that will trigger `pollCallback` to be executed *asynchronously* (see [this code|https://github.com/apache/mesos/blob/1.7.0/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp#L66:L70] for
details).
 ** Close the fd opened in #1 *immediately* which means the fd can be reused now.
 # Container2 is launched, and CNI isolator calls `io::read` to read the stdout/stderr of
CNI plugin for this container. Internally `io::read` *reuses* the fd closed in #2 and call
libevent to poll it.
 # Now the function `pollCallback` for container1 is executed, and it will delete the poll
object which will trigger `event_free` to deallocate the event for this container (see [this
code|https://github.com/apache/mesos/blob/1.7.0/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp#L50:L52] for
details). Internally `event_free` will call `event_del` => `event_del_internal` => `evmap_io_del`
=> `evsel->del` to *disable* the fd (see [this code|https://github.com/libevent/libevent/blob/release-2.0.22-stable/event-internal.h#L78:L79] for
details), but that fd is now used to read stdout/stderr for container2 in #3. Since the fd
is disabled inside libevent, the `io::read` we do in #3 will never return so the container2
will be stuck at `ISOLATING` state.


was (Author: qianzhang):
After reading some libevent code and our code to call libevent, I think the root cause of
this issue is, after we call libevent to poll an fd, that fd is disabled inside libevent due
to a race. Here is the flow:
 # Container1 is launched and cgroups memory subsystem calls the function `cgroups::memory::oom::listen()`
to listen OOM event for this container, and that function will internally open an fd, call
libevent to poll it, and return a future to cgroups memory subsystem.
 # Container1 exits and when we destroy it, the cleanup method of cgroups memory subsystem
will discard the future got in #1. As the result, `Listener::finalize()` will be called (see
[this code|https://github.com/apache/mesos/blob/1.7.0/src/linux/cgroups.cpp#L1069:L1087] for
details), and it will
 ** Discard the future returned by libevent poll which will cause `pollDiscard()` called
and that will trigger `pollCallback` to be executed *asynchronously* (see [this code|https://github.com/apache/mesos/blob/1.7.0/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp#L66:L70] for
details).
 ** Close the fd opened in #1 *immediately* which means the fd can be reused now.
 # Container2 is launched, and CNI isolator calls `io::read` to read the stdout/stderr of
CNI plugin for this container. Internally `io::read` *reuses* the fd closed in #2 and call
libevent to poll it.
 # Now the function `pollCallback` for container1 is executed, and it will delete the poll
object which will trigger `event_free` to deallocate the event for this container (see [this
code|https://github.com/apache/mesos/blob/1.7.0/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp#L50:L52] for
details). Internally `event_free` will call `event_del` -> `event_del_internal` -> `evmap_io_del`
-> `evsel->del` to *disable* the fd (see [this code|https://github.com/libevent/libevent/blob/release-2.0.22-stable/event-internal.h#L78:L79] for
details), but that fd is now used to read stdout/stderr for container2 in #3. Since the fd
is disabled inside libevent, the `io::read` we do in #3 will never return so the container2
will be stuck at `ISOLATING` state.

> Container stuck at ISOLATING state due to libevent poll never returns
> ---------------------------------------------------------------------
>
>                 Key: MESOS-9334
>                 URL: https://issues.apache.org/jira/browse/MESOS-9334
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: Qian Zhang
>            Assignee: Qian Zhang
>            Priority: Critical
>
> We found UCR container may be stuck at `ISOLATING` state:
> {code:java}
> 2018-10-03 09:13:23: I1003 09:13:23.274561 2355 containerizer.cpp:3122] Transitioning
the state of container 1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54 from PREPARING to ISOLATING
> 2018-10-03 09:13:23: I1003 09:13:23.279223 2354 cni.cpp:962] Bind mounted '/proc/5244/ns/net'
to '/run/mesos/isolators/network/cni/1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54/ns' for container
1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54
> 2018-10-03 09:23:22: I1003 09:23:22.879868 2354 containerizer.cpp:2459] Destroying container
1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54 in ISOLATING state
> {code}
>  In the above logs, the state of container `1e5b8fc3-5c9e-4159-a0b9-3d46595a5b54` was
transitioned to `ISOLATING` at 09:13:23, but did not transitioned to any other states until
it was destroyed due to the executor registration timeout (10 mins). And the destroy can never
complete since it needs to wait for the container to finish isolating.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message