mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Connor Doyle <connor....@gmail.com>
Subject Re: Orphaned Docker containers in Mesos 0.20.1
Date Fri, 03 Oct 2014 03:02:23 GMT
It doesn't appear to be related to the registration timeout; based on the logs the time between
task launch and kill was only about 4.3 seconds.
--
Connor

> On Oct 2, 2014, at 14:24, Dick Davies <dick@hellooperator.net> wrote:
> 
> One thing to check - have you upped
> 
> --executor_registration_timeout
> 
> from the default of 1min? a docker pull can easily take longer than that.
> 
>> On 2 October 2014 22:18, Michael Babineau <michael.babineau@gmail.com> wrote:
>> I'm seeing an issue where tasks are being marked as killed but remain
>> running. The tasks all run via the native Docker containerizer and are
>> started from Marathon.
>> 
>> The net result is additional, orphaned Docker containers that must be
>> stopped/removed manually.
>> 
>> Versions:
>> - Mesos 0.20.1
>> - Marathon 0.7.1
>> - Docker 1.2.0
>> - Ubuntu 14.04
>> 
>> Environment:
>> - 3 ZK nodes, 3 Mesos Masters, and 3 Mesos Slaves (all separate instances)
>> on EC2
>> 
>> Here's the task in the Mesos UI:
>> 
>> (note that stderr continues to update with the latest container output)
>> 
>> Here's the still-running Docker container:
>> $ docker ps|grep 1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f
>> 3d451b8213ea
>> docker.thefactory.com/ace-serialization:f7aa1d4f46f72d52f5a20ef7ae8680e4acf88bc0
>> "\"/bin/sh -c 'java    26 minutes ago      Up 26 minutes       9990/tcp
>> mesos-1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f
>> 
>> Here are the Mesos logs associated with the task:
>> $ grep eda431d7-4a74-11e4-a320-56847afe9799 /var/log/mesos/mesos-slave.INFO
>> I1002 20:44:39.176024  1528 slave.cpp:1002] Got assigned task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 for framework
>> 20140919-224934-1593967114-5050-1518-0000
>> I1002 20:44:39.176257  1528 slave.cpp:1112] Launching task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 for framework
>> 20140919-224934-1593967114-5050-1518-0000
>> I1002 20:44:39.177287  1528 slave.cpp:1222] Queuing task
>> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' for executor
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
>> '20140919-224934-1593967114-5050-1518-0000
>> I1002 20:44:39.191769  1528 docker.cpp:743] Starting container
>> '1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f' for task
>> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' (and executor
>> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799') of framework
>> '20140919-224934-1593967114-5050-1518-0000'
>> I1002 20:44:43.707033  1521 slave.cpp:1278] Asked to kill task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
>> 20140919-224934-1593967114-5050-1518-0000
>> I1002 20:44:43.707811  1521 slave.cpp:2088] Handling status update
>> TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
>> 20140919-224934-1593967114-5050-1518-0000 from @0.0.0.0:0
>> W1002 20:44:43.708273  1521 slave.cpp:1354] Killing the unregistered
>> executor 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework
>> 20140919-224934-1593967114-5050-1518-0000 because it has no tasks
>> E1002 20:44:43.708375  1521 slave.cpp:2205] Failed to update resources for
>> container 1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f of executor
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 running task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 on status update for
>> terminal task, destroying container: No container found
>> I1002 20:44:43.708524  1521 status_update_manager.cpp:320] Received status
>> update TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
>> 20140919-224934-1593967114-5050-1518-0000
>> I1002 20:44:43.708709  1521 status_update_manager.cpp:373] Forwarding status
>> update TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
>> 20140919-224934-1593967114-5050-1518-0000 to master@10.2.0.182:5050
>> I1002 20:44:43.728991  1526 status_update_manager.cpp:398] Received status
>> update acknowledgement (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
>> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
>> 20140919-224934-1593967114-5050-1518-0000
>> I1002 20:47:05.904324  1527 slave.cpp:2538] Monitoring executor
>> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework
>> '20140919-224934-1593967114-5050-1518-0000' in container
>> '1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f'
>> I1002 20:47:06.311027  1525 slave.cpp:1733] Got registration for executor
>> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework
>> 20140919-224934-1593967114-5050-1518-0000 from executor(1)@10.2.1.34:29920
>> 
>> I'll typically see a barrage of these in association with a Marathon app
>> update (which deploys new tasks). Eventually, one container "sticks" and we
>> get a RUNNING task instead of a KILLED one.
>> 
>> Where else can I look?

Mime
View raw message