mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liqiang Lin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-5188) docker executor thinks task is failed when docker container was stopped
Date Wed, 13 Apr 2016 01:01:25 GMT

    [ https://issues.apache.org/jira/browse/MESOS-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238353#comment-15238353
] 

Liqiang Lin commented on MESOS-5188:
------------------------------------

If that's the truth, the executor shall remove the stopped docker container in shutting down
of executor, rather than try to stop the docker container.

> docker executor thinks task is failed when docker container was stopped
> -----------------------------------------------------------------------
>
>                 Key: MESOS-5188
>                 URL: https://issues.apache.org/jira/browse/MESOS-5188
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker
>    Affects Versions: 0.28.0
>            Reporter: Liqiang Lin
>             Fix For: 0.29.0
>
>
> Test cases:
> 1. Launch a task with Swarm (on Mesos).
> {code}
> # docker -H 192.168.56.110:54375 run -d --cpu-shares 1 ubuntu sleep 300
> {code}
> 2. Then stop the docker container.
> {code}
> # docker -H 192.168.56.110:54375 ps
> CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS
             PORTS               NAMES
> b4813ba3ed4d        ubuntu              "sleep 300"         9 seconds ago       Up 8
seconds                            mesos1/mesos-2cd5576e-6260-4262-a62c-b0dc45c86c45-S1.1595e79b-aef2-44b6-a313-ad4ff8626958
> # docker -H 192.168.56.110:54375 stop b4813ba3ed4d
> b4813ba3ed4d
> {code}
> 3. Found the task is failed. See Mesos slave log,
> {code}
> I0407 09:10:57.606552 32307 slave.cpp:1508] Got assigned task 99ee7dc74861 for framework
5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
> I0407 09:10:57.608230 32307 slave.cpp:1627] Launching task 99ee7dc74861 for framework
5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
> I0407 09:10:57.609979 32307 paths.cpp:528] Trying to chown '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
to user 'root'
> I0407 09:10:57.615881 32307 slave.cpp:5586] Launching executor 99ee7dc74861 of framework
5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000 with resources cpus(*):0.1; mem(*):32 in work directory
'/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
> I0407 09:12:18.458449 32307 slave.cpp:1845] Queuing task '99ee7dc74861' for executor
'99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
> I0407 09:12:18.459092 32307 slave.cpp:3711] No pings from master received within 75secs
> I0407 09:12:18.460212 32307 slave.cpp:4593] Current disk usage 56.53%. Max allowed age:
2.342613645432778days
> I0407 09:12:18.463484 32307 slave.cpp:928] Re-detecting master
> I0407 09:12:18.463969 32307 slave.cpp:975] Detecting new master
> I0407 09:12:18.464501 32307 slave.cpp:939] New master detected at master@192.168.56.110:5050
> I0407 09:12:18.464848 32307 slave.cpp:964] No credentials provided. Attempting to register
without authentication
> I0407 09:12:18.465237 32307 slave.cpp:975] Detecting new master
> I0407 09:12:18.463611 32312 status_update_manager.cpp:174] Pausing sending status updates
> I0407 09:12:18.465744 32312 status_update_manager.cpp:174] Pausing sending status updates
> I0407 09:12:18.472323 32313 docker.cpp:1011] Starting container '250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
for task '99ee7dc74861' (and executor '99ee7dc74861') of framework '5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000'
> I0407 09:12:18.588739 32313 slave.cpp:1218] Re-registered with master master@192.168.56.110:5050
> I0407 09:12:18.588927 32313 slave.cpp:1254] Forwarding total oversubscribed resources
> I0407 09:12:18.589320 32313 slave.cpp:2395] Updating framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
pid to scheduler(1)@192.168.56.110:53375
> I0407 09:12:18.592079 32308 status_update_manager.cpp:181] Resuming sending status updates
> I0407 09:12:18.592842 32313 slave.cpp:2534] Updated checkpointed resources from  to
> I0407 09:12:18.592793 32308 status_update_manager.cpp:181] Resuming sending status updates
> I0407 09:12:20.582041 32307 slave.cpp:2836] Got registration for executor '99ee7dc74861'
of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000 from executor(1)@192.168.56.110:40725
> I0407 09:12:20.584446 32307 docker.cpp:1308] Ignoring updating container '250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
with resources passed to update is identical to existing resources
> I0407 09:12:20.585093 32307 slave.cpp:2010] Sending queued task '99ee7dc74861' to executor
'99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000 at executor(1)@192.168.56.110:40725
> I0407 09:12:21.307077 32312 slave.cpp:3195] Handling status update TASK_RUNNING (UUID:
a7098650-cbf6-4445-8216-b5f658d2f5f4) for task 99ee7dc74861 of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
from executor(1)@192.168.56.110:40725
> I0407 09:12:21.308820 32308 status_update_manager.cpp:320] Received status update TASK_RUNNING
(UUID: a7098650-cbf6-4445-8216-b5f658d2f5f4) for task 99ee7dc74861 of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
> I0407 09:12:21.310058 32308 slave.cpp:3593] Forwarding the update TASK_RUNNING (UUID:
a7098650-cbf6-4445-8216-b5f658d2f5f4) for task 99ee7dc74861 of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
to master@192.168.56.110:5050
> I0407 09:12:21.310744 32308 slave.cpp:3503] Sending acknowledgement for status update
TASK_RUNNING (UUID: a7098650-cbf6-4445-8216-b5f658d2f5f4) for task 99ee7dc74861 of framework
5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000 to executor(1)@192.168.56.110:40725
> I0407 09:12:21.315765 32307 status_update_manager.cpp:392] Received status update acknowledgement
(UUID: a7098650-cbf6-4445-8216-b5f658d2f5f4) for task 99ee7dc74861 of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
> I0407 09:16:13.943831 32306 slave.cpp:3195] Handling status update TASK_FAILED (UUID:
552e07b9-f7b8-4442-8e5c-8e17b948f6dc) for task 99ee7dc74861 of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
from executor(1)@192.168.56.110:40725
> I0407 09:16:14.063880 32307 status_update_manager.cpp:320] Received status update TASK_FAILED
(UUID: 552e07b9-f7b8-4442-8e5c-8e17b948f6dc) for task 99ee7dc74861 of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
> I0407 09:16:14.064471 32312 slave.cpp:3593] Forwarding the update TASK_FAILED (UUID:
552e07b9-f7b8-4442-8e5c-8e17b948f6dc) for task 99ee7dc74861 of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
to master@192.168.56.110:5050
> I0407 09:16:14.065014 32312 slave.cpp:3503] Sending acknowledgement for status update
TASK_FAILED (UUID: 552e07b9-f7b8-4442-8e5c-8e17b948f6dc) for task 99ee7dc74861 of framework
5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000 to executor(1)@192.168.56.110:40725
> I0407 09:16:14.071398 32307 status_update_manager.cpp:392] Received status update acknowledgement
(UUID: 552e07b9-f7b8-4442-8e5c-8e17b948f6dc) for task 99ee7dc74861 of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
> I0407 09:16:14.950019 32311 slave.cpp:3721] executor(1)@192.168.56.110:40725 exited
> I0407 09:16:14.993948 32308 docker.cpp:1997] Executor for container '250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
has exited
> I0407 09:16:14.994104 32308 docker.cpp:1717] Destroying container '250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
> I0407 09:16:14.994875 32308 docker.cpp:1845] Running docker stop on container '250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
> I0407 09:16:14.996547 32308 slave.cpp:4097] Executor '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
exited with status 0
> I0407 09:16:14.996695 32308 slave.cpp:4201] Cleaning up executor '99ee7dc74861' of framework
5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000 at executor(1)@192.168.56.110:40725
> I0407 09:16:14.997028 32312 gc.cpp:55] Scheduling '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
for gc 6.99998846192days in the future
> I0407 09:16:14.997289 32308 slave.cpp:4289] Cleaning up framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
> I0407 09:16:14.997691 32307 status_update_manager.cpp:282] Closing status update streams
for framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000
> I0407 09:16:14.997697 32312 gc.cpp:55] Scheduling '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000/executors/99ee7dc74861'
for gc 6.99998845958222days in the future
> I0407 09:16:14.998423 32312 gc.cpp:55] Scheduling '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-0000'
for gc 6.99998845141333days in the future
> I0407 09:16:18.470573 32308 slave.cpp:4593] Current disk usage 56.56%. Max allowed age:
2.341048334320162days
> {code}
> Why docker executor thinks the task is failed when stop the docker container? The docker
container still exists. Should it be relay on state that docker container is removed?
> Docker executor code
> {code}
>   void reaped(
>       ExecutorDriver* _driver,
>       const Future<Nothing>& run)
>   {
>     // Wait for docker->stop to finish, and best effort wait for the
>     // inspect future to complete with a timeout.
>     stop.onAny(defer(self(), [=](const Future<Nothing>&) {
>       inspect
>         .after(DOCKER_INSPECT_TIMEOUT, [=](const Future<Nothing>&) {
>           inspect.discard();
>           return inspect;
>         })
>         .onAny(defer(self(), [=](const Future<Nothing>&) {
>           CHECK_SOME(driver);
>           TaskState state;
>           string message;
>           if (!stop.isReady()) {
>             state = TASK_FAILED;
>             message = "Unable to stop docker container, error: " +
>                       (stop.isFailed() ? stop.failure() : "future discarded");
>           } else if (killed) {
>             state = TASK_KILLED;
>           } else if (!run.isReady()) {
>             state = TASK_FAILED;
>             message = "Docker container run error: " +
>                       (run.isFailed() ?
>                        run.failure() : "future discarded");
> {code}
> Another problem is, Mesos thinks the task is failed, and do some clean up for executor.
But the orphan docker container is not removed. I still can start it manually, Mesos can not
trace the resource usage of this orphan container.
> {code}
> # docker -H 192.168.56.110:54375 start b4813ba3ed4d
> b4813ba3ed4d
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message