mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Rukletsov <ruklet...@gmail.com>
Subject Re: Review Request 59746: Separated discarded and failed cases for container launch.
Date Mon, 26 Jun 2017 15:31:47 GMT


> On June 2, 2017, 4:37 p.m., Jie Yu wrote:
> > src/slave/slave.cpp
> > Line 5147 (original), 5147 (patched)
> > <https://reviews.apache.org/r/59746/diff/1/?file=1740554#file1740554line5147>
> >
> >     Can you explain to me in what scenario, the `future` will be in DISCARDED state?
who discard the promise associated with this future?
> 
> Alexander Rukletsov wrote:
>     Sure. Consider docker containerizer.
>     
>     1) During container launch, docker containerizer calls `pull()`: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1238
>     2) The container enters `PULLING` state: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L435
>     3) While the image is being pulled by docker, future `containers_[containerId]->pull`
is returned from `pull()`: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L446
>     4) This future is part of the `.then` chain returned from `_launch()`: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L1269
>     5) Now while docker is pulling, `destroy()` is called, which discards the "pulling
future": https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128
>     6) But discarding that future is propagated up the chain: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/3rdparty/libprocess/include/process/future.hpp#L1410-L1411
>     7) Which triggers the `onAny` callback attached to launch: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L2800-L2810
>     8) Which in turn gives us discarded future treated as launch error: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/slave.cpp#L5147-L5152
> 
> Jie Yu wrote:
>     https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/slave/containerizer/docker.cpp#L2126-L2128
>     
>     This discards the future, but not necessarily transition the future to DISCARDED
state. That's the reason we have `hasDiscard` and `isDiscarded` methods for Future becaue
they means different things. Can you point to me where the promise associated with this future
is actually being transitioned into DISCARDED state?
> 
> Alexander Rukletsov wrote:
>     Sure. In this case, we discard pulling in case client discarded the future: https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/src/docker/docker.cpp#L1512
>     
>     Additionally, I've manually reproduced the issue (https://issues.apache.org/jira/browse/MESOS-7601)
>     ```
>     ./src/mesos-execute --master=192.99.40.208:5050 --containerizer=docker --docker_image=ubuntu:16.04
--name=pull-test --command="sleep 1000"
>     ```
>     aborted right after the start when docker was pulling the image yielded the following
verbose agent log:
>     ```
>     I0621 12:59:22.271728 28980 fetcher.cpp:324] Starting to fetch URIs for container:
e2227d2f-fb6e-4fba-b6b6-528d2da7b276, directory: /tmp/a/slaves/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-S0/frameworks/f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003/executors/pull-test/runs/e2227d2f-fb6e-4fba-b6b6-528d2da7b276
>     I0621 12:59:22.272665 28989 docker.cpp:1352] Running docker -H unix:///var/run/docker.sock
inspect ubuntu:16.04
>     I0621 12:59:22.420902 28990 docker.cpp:1426] Running docker -H unix:///var/run/docker.sock
pull ubuntu:16.04
>     I0621 12:59:23.070950 28980 slave.cpp:3130] Asked to shut down framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
by master@192.99.40.208:5050
>     I0621 12:59:23.071007 28980 slave.cpp:3155] Shutting down framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
>     I0621 12:59:23.071146 28980 slave.cpp:5625] Shutting down executor 'pull-test' of
framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
>     W0621 12:59:23.071171 28980 slave.hpp:732] Unable to send event to executor 'pull-test'
of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003: unknown connection type
>     I0621 12:59:28.072532 28984 slave.cpp:5698] Killing executor 'pull-test' of framework
f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
>     I0621 12:59:28.072849 28985 docker.cpp:2125] Destroying container e2227d2f-fb6e-4fba-b6b6-528d2da7b276
in PULLING state
>     I0621 12:59:28.073074 28985 docker.cpp:149] 'docker -H unix:///var/run/docker.sock
pull ubuntu:16.04' is being discarded
>     E0621 12:59:28.150388 28981 slave.cpp:5183] Container 'e2227d2f-fb6e-4fba-b6b6-528d2da7b276'
for executor 'pull-test' of framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed to
start: future discarded
>     E0621 12:59:28.150698 28978 slave.cpp:5290] Termination of executor 'pull-test' of
framework f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003 failed: unknown container
>     W0621 12:59:28.150737 28985 composing.cpp:569] Attempted to destroy unknown container
e2227d2f-fb6e-4fba-b6b6-528d2da7b276
>     I0621 12:59:28.150754 28978 slave.cpp:5403] Cleaning up executor 'pull-test' of framework
f123fa01-fe6e-45f4-b0a9-022b0fc3ce26-0003
>     ```
>     
>     I believe killing the process tree leads to discarded future returned by `Subprocess`
call.
>     
>     The question here, I think, is whether it is safe to _always_ treat discarded container
launch attempts as non-failures. I would argue it makes sense, because for failures we should
use future failures : ). What do you think?

I've realised that I have not answered your question explicitly : ). So when `docker pull`
is forcefully killed and the corresponding process is reaped, the `subprocess.status` future
is set to ready, but the chained one (`___pull` if my mental compiler works correctly) transitions
to `discarded` because of [1], leading to the original `launch` future being discarded as
well.

[1] https://github.com/apache/mesos/blob/9e61b3c7af35a29664361067c0bfa8b460bfefb9/3rdparty/libprocess/include/process/future.hpp#L1297


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59746/#review176789
-----------------------------------------------------------


On June 2, 2017, 1:10 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59746/
> -----------------------------------------------------------
> 
> (Updated June 2, 2017, 1:10 p.m.)
> 
> 
> Review request for mesos, Ian Downes, Jie Yu, Joseph Wu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-7601
>     https://issues.apache.org/jira/browse/MESOS-7601
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Discarded future returned from the containerizer->launch() does not
> necessarily mean that the container launch has failed. For example,
> a framework may stop while its task are being started.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 5f80170fcd3c05add8b6e9e3107cff062818c1dc 
>   include/mesos/v1/mesos.proto 4b528751006f709f841e44f48c9f5c2dc035b402 
>   src/slave/slave.cpp 0c7e5f4ef905b3897d341c3147a208fc7a8a12e0 
> 
> 
> Diff: https://reviews.apache.org/r/59746/diff/1/
> 
> 
> Testing
> -------
> 
> make check on several Linux distros.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message