mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eduardo Jiménez <yoeduar...@gmail.com>
Subject Re: Docker odd behavior
Date Fri, 24 Oct 2014 00:58:45 GMT
That worked, but first I had to destroy marathon's state in zookeeper, as
it just kept trying to deploy apps that I was trying to destroy, and no
amount of cancelling deployments and scaling down apps worked.

After that, I had to kill all of the docker processes that were still being
supervised by the mesos-slaves, and finally, I had to reboot one
mesos-slave that still believed one of the docker tasks was running
underneath.

Once I did all that, I can now deploy a docker app without the whole thing
going crazy on me.

Thanks!

Eduardo

On Thu, Oct 23, 2014 at 1:50 AM, Nils De Moor <nils.de.moor@gmail.com>
wrote:

> Sorry for that, here is the direct link:
>
> http://mail-archives.apache.org/mod_mbox/mesos-user/201410.mbox/%3CCALnA0DwbqJdEm3at7SonsXmCwxnN3%3DCUrwgoPBHXJDoOFyJjig%40mail.gmail.com%3E
>
> On Thu, Oct 23, 2014 at 10:49 AM, Nils De Moor <nils.de.moor@gmail.com>
> wrote:
>
>> I had the same issue as you did, here is how I fixed it:
>> http://mail-archives.apache.org/mod_mbox/mesos-user/201410.mbox/browser
>>
>> On Thu, Oct 23, 2014 at 1:42 AM, Connor Doyle <connor@mesosphere.io>
>> wrote:
>>
>>> Hi Eduardo,
>>>
>>> There is a known defect in Mesos that matches your description:
>>> https://issues.apache.org/jira/browse/MESOS-1915
>>> https://issues.apache.org/jira/browse/MESOS-1884
>>>
>>> A fix will be included in the next release.
>>> https://reviews.apache.org/r/26486
>>>
>>> You see the killTask because the default --task_launch_timeout value for
>>> Marathon is 60 seconds.
>>> Created an issue to make the logging around this better:
>>> https://github.com/mesosphere/marathon/issues/732
>>>
>>> --
>>> Connor
>>>
>>>
>>> On Oct 22, 2014, at 16:18, Eduardo Jiménez <yoeduardoj@gmail.com> wrote:
>>>
>>> > Hi,
>>> >
>>> > I've started experimenting with mesos using the docker containerizer,
>>> and running a simple example got into a very strange state.
>>> >
>>> > I have mesos-0.20.1, marathon-0.7 setup on EC2, using Amazon Linux:
>>> >
>>> > Linux <ip> 3.14.20-20.44.amzn1.x86_64 #1 SMP Mon Oct 6 22:52:46 UTC
>>> 2014 x86_64 x86_64 x86_64 GNU/Linux
>>> >
>>> > Docker version 1.2.0, build fa7b24f/1.2.0
>>> >
>>> > I start the mesos slave with these relevant options:
>>> >
>>> > --cgroups_hierarchy=/cgroup
>>> > --containerizers=docker,mesos
>>> > --executor_registration_timeout=5mins
>>> > --isolation=cgroups/cpu,cgroups/mem
>>> >
>>> > I launched a very simple app, which is from the mesosphere examples:
>>> >
>>> > {
>>> >   "container": {
>>> >     "type": "DOCKER",
>>> >     "docker": {
>>> >       "image": "libmesos/ubuntu"
>>> >     }
>>> >   },
>>> >   "id": "ubuntu-docker2",
>>> >   "instances": "1",
>>> >   "cpus": "0.5",
>>> >   "mem": "512",
>>> >   "uris": [],
>>> >   "cmd": "while sleep 10; do date -u +%T; done"
>>> > }
>>> >
>>> > The app launches, but then mesos states the task is KILLED, yet the
>>> docker container is STILL running. Here's the sequence of logs from that
>>> mesos-slave.
>>> >
>>> > 1) Task gets created and assigned:
>>> >
>>> > I1022 17:44:13.971096 15195 slave.cpp:1002] Got assigned task
>>> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework
>>> 20141017-172055-3489660938-5050-1603-0000
>>> > I1022 17:44:13.971367 15195 slave.cpp:1112] Launching task
>>> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework
>>> 20141017-172055-3489660938-5050-1603-0000
>>> > I1022 17:44:13.973047 15195 slave.cpp:1222] Queuing task
>>> 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' for executor
>>> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
>>> '20141017-172055-3489660938-5050-1603-0000
>>> > I1022 17:44:13.989893 15195 docker.cpp:743] Starting container
>>> 'c1fc27c8-13e9-484f-a30c-cb062ec4c978' for task
>>> 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' (and executor
>>> 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799') of framework
>>> '20141017-172055-3489660938-5050-1603-0000'
>>> >
>>> > So far so good. The log statements right next to "Starting container"
>>> is:
>>> >
>>> > I1022 17:45:14.893309 15196 slave.cpp:1278] Asked to kill task
>>> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
>>> 20141017-172055-3489660938-5050-1603-0000
>>> > I1022 17:45:14.894579 15196 slave.cpp:2088] Handling status update
>>> TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task
>>> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
>>> 20141017-172055-3489660938-5050-1603-0000 from @0.0.0.0:0
>>> > W1022 17:45:14.894798 15196 slave.cpp:1354] Killing the unregistered
>>> executor 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' of framework
>>> 20141017-172055-3489660938-5050-1603-0000 because it has no tasks
>>> > E1022 17:45:14.925014 15192 slave.cpp:2205] Failed to update resources
>>> for container c1fc27c8-13e9-484f-a30c-cb062ec4c978 of executor
>>> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 running task
>>> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 on status update for
>>> terminal task, destroying container: No container found
>>> >
>>> > After this, there's several log messages like this:
>>> >
>>> > I1022 17:45:14.926197 15194 status_update_manager.cpp:320] Received
>>> status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for
>>> task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
>>> 20141017-172055-3489660938-5050-1603-0000
>>> > I1022 17:45:14.926378 15194 status_update_manager.cpp:373] Forwarding
>>> status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for
>>> task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
>>> 20141017-172055-3489660938-5050-1603-0000 to master@10.0.0.147:5050
>>> > W1022 17:45:16.169214 15196 status_update_manager.cpp:181] Resending
>>> status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for
>>> task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
>>> 20141017-172055-3489660938-5050-1603-0000
>>> > I1022 17:45:16.169275 15196 status_update_manager.cpp:373] Forwarding
>>> status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for
>>> task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
>>> 20141017-172055-3489660938-5050-1603-0000 to master@10.0.0.147:5050
>>> >
>>> >
>>> > Eventually the TASK_KILLED update is acked and the Mesos UI shows the
>>> task as killed. By then, the process should be dead, but its not.
>>> >
>>> > $ sudo docker ps
>>> > CONTAINER ID        IMAGE                    COMMAND
>>> CREATED             STATUS              PORTS               NAMES
>>> > f76784e1af8b        libmesos/ubuntu:latest   "/bin/sh -c 'while s   5
>>> hours ago         Up 5 hours
>>> mesos-c1fc27c8-13e9-484f-a30c-cb062ec4c978
>>> >
>>> >
>>> > The container shows in the UI like this:
>>> >
>>> > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799
>>>  ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799     KILLED  5 hours
>>> ago     5 hours ago
>>> > And its been running the whole time.
>>> >
>>> > There's no other logging indicating why killTask was invoked, which
>>> makes this extremely frustrating to debug.
>>> >
>>> > Has anyone seen something similar?
>>> >
>>> > Thanks,
>>> >
>>> > Eduardo
>>> >
>>>
>>>
>>
>

Mime
View raw message