mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Connor Doyle <con...@mesosphere.io>
Subject Re: Docker odd behavior
Date Wed, 22 Oct 2014 23:42:34 GMT
Hi Eduardo,

There is a known defect in Mesos that matches your description:
https://issues.apache.org/jira/browse/MESOS-1915
https://issues.apache.org/jira/browse/MESOS-1884

A fix will be included in the next release.
https://reviews.apache.org/r/26486

You see the killTask because the default --task_launch_timeout value for Marathon is 60 seconds.
Created an issue to make the logging around this better:
https://github.com/mesosphere/marathon/issues/732

--
Connor


On Oct 22, 2014, at 16:18, Eduardo Jiménez <yoeduardoj@gmail.com> wrote:

> Hi,
> 
> I've started experimenting with mesos using the docker containerizer, and running a simple
example got into a very strange state.
> 
> I have mesos-0.20.1, marathon-0.7 setup on EC2, using Amazon Linux:
> 
> Linux <ip> 3.14.20-20.44.amzn1.x86_64 #1 SMP Mon Oct 6 22:52:46 UTC 2014 x86_64
x86_64 x86_64 GNU/Linux
> 
> Docker version 1.2.0, build fa7b24f/1.2.0
> 
> I start the mesos slave with these relevant options:
> 
> --cgroups_hierarchy=/cgroup
> --containerizers=docker,mesos
> --executor_registration_timeout=5mins
> --isolation=cgroups/cpu,cgroups/mem
> 
> I launched a very simple app, which is from the mesosphere examples:
> 
> {
>   "container": {
>     "type": "DOCKER",
>     "docker": {
>       "image": "libmesos/ubuntu"
>     }
>   },
>   "id": "ubuntu-docker2",
>   "instances": "1",
>   "cpus": "0.5",
>   "mem": "512",
>   "uris": [],
>   "cmd": "while sleep 10; do date -u +%T; done"
> }
> 
> The app launches, but then mesos states the task is KILLED, yet the docker container
is STILL running. Here's the sequence of logs from that mesos-slave.
> 
> 1) Task gets created and assigned:
> 
> I1022 17:44:13.971096 15195 slave.cpp:1002] Got assigned task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799
for framework 20141017-172055-3489660938-5050-1603-0000
> I1022 17:44:13.971367 15195 slave.cpp:1112] Launching task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799
for framework 20141017-172055-3489660938-5050-1603-0000
> I1022 17:44:13.973047 15195 slave.cpp:1222] Queuing task 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799'
for executor ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework '20141017-172055-3489660938-5050-1603-0000
> I1022 17:44:13.989893 15195 docker.cpp:743] Starting container 'c1fc27c8-13e9-484f-a30c-cb062ec4c978'
for task 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' (and executor 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799')
of framework '20141017-172055-3489660938-5050-1603-0000'
> 
> So far so good. The log statements right next to "Starting container" is:
> 
> I1022 17:45:14.893309 15196 slave.cpp:1278] Asked to kill task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799
of framework 20141017-172055-3489660938-5050-1603-0000
> I1022 17:45:14.894579 15196 slave.cpp:2088] Handling status update TASK_KILLED (UUID:
660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799
of framework 20141017-172055-3489660938-5050-1603-0000 from @0.0.0.0:0
> W1022 17:45:14.894798 15196 slave.cpp:1354] Killing the unregistered executor 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799'
of framework 20141017-172055-3489660938-5050-1603-0000 because it has no tasks
> E1022 17:45:14.925014 15192 slave.cpp:2205] Failed to update resources for container
c1fc27c8-13e9-484f-a30c-cb062ec4c978 of executor ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799
running task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 on status update for terminal
task, destroying container: No container found
> 
> After this, there's several log messages like this:
> 
> I1022 17:45:14.926197 15194 status_update_manager.cpp:320] Received status update TASK_KILLED
(UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799
of framework 20141017-172055-3489660938-5050-1603-0000
> I1022 17:45:14.926378 15194 status_update_manager.cpp:373] Forwarding status update TASK_KILLED
(UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799
of framework 20141017-172055-3489660938-5050-1603-0000 to master@10.0.0.147:5050
> W1022 17:45:16.169214 15196 status_update_manager.cpp:181] Resending status update TASK_KILLED
(UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799
of framework 20141017-172055-3489660938-5050-1603-0000
> I1022 17:45:16.169275 15196 status_update_manager.cpp:373] Forwarding status update TASK_KILLED
(UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799
of framework 20141017-172055-3489660938-5050-1603-0000 to master@10.0.0.147:5050
> 
> 
> Eventually the TASK_KILLED update is acked and the Mesos UI shows the task as killed.
By then, the process should be dead, but its not.
> 
> $ sudo docker ps
> CONTAINER ID        IMAGE                    COMMAND                CREATED         
   STATUS              PORTS               NAMES
> f76784e1af8b        libmesos/ubuntu:latest   "/bin/sh -c 'while s   5 hours ago     
   Up 5 hours                              mesos-c1fc27c8-13e9-484f-a30c-cb062ec4c978
> 
> 
> The container shows in the UI like this:
> 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799	ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799
KILLED	5 hours ago	5 hours ago
> And its been running the whole time.
> 
> There's no other logging indicating why killTask was invoked, which makes this extremely
frustrating to debug.
> 
> Has anyone seen something similar?
> 
> Thanks,
> 
> Eduardo
> 


Mime
View raw message