mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Giedymin <jason.giedy...@gmail.com>
Subject Re: docker based executor
Date Fri, 17 Apr 2015 20:47:08 GMT
Try: 

until <something>; do
  echo "waiting for something to do something"
  sleep 5
done

You can put this in a bash file and run that.

If you have a dockerfile would be easier to debug.

-Jason

> On Apr 17, 2015, at 4:24 PM, Tyson Norris <tnorris@adobe.com> wrote:
> 
> Yes, agreed that the command should not exit - but the container is killed at around
0.5 s after launch regardless of whether the command terminates, which is why I’ve been
experimenting using commands with varied exit times. 
> 
> For example, forget about the executor needing to register momentarily.
> 
> Using the command:
> echo testing123c && sleep 0.1 && echo testing456c
> -> I see the expected output in stdout, and the container is destroyed (as expected),
because the container exits quickly, and then is destroyed
> 
> Using the command:
> echo testing123d && sleep 0.6 && echo testing456d
> -> I do NOT see the expected output in stdout (I only get testing123d), because the
container is destroyed prematurely after ~0.5 seconds
> 
> Using the “real” storm command, I get no output in stdout, probably because no output
is generated within 0.5 seconds of launch - it is a bit of a pig to startup, so I’m currently
just trying to execute some other commands for testing purposes.
> 
> So I’m guessing this is a timeout issue, or else that the container is reaped inappropriately,
or something else… looking through this code, I’m trying to figure out the steps take
during executor launch:
> https://github.com/apache/mesos/blob/00318fc1b30fc0961c2dfa4d934c37866577d801/src/slave/containerizer/docker.cpp#L715
> 
> Thanks
> Tyson
>   
> 
> 
> 
> 
>> On Apr 17, 2015, at 12:53 PM, Jason Giedymin <jason.giedymin@gmail.com> wrote:
>> 
>> What is the last command you have docker doing?
>> 
>> If that command exits then the docker will begin to end the container.
>> 
>> -Jason
>> 
>>> On Apr 17, 2015, at 3:23 PM, Tyson Norris <tnorris@adobe.com> wrote:
>>> 
>>> Hi -
>>> I am looking at revving the mesos-storm framework to be dockerized (and simpler).

>>> I’m using mesos 0.22.0-1.0.ubuntu1404
>>> mesos master + mesos slave are deployed in docker containers, in case it matters.

>>> 
>>> I have the storm (nimbus) framework launching fine as a docker container, but
launching tasks for a topology is having problems related to using a docker-based executor.
>>> 
>>> For example. 
>>> 
>>> TaskInfo task = TaskInfo.newBuilder()
>>>   .setName("worker " + slot.getNodeId() + ":" + slot.getPort())
>>>   .setTaskId(taskId)
>>>   .setSlaveId(offer.getSlaveId())
>>>   .setExecutor(ExecutorInfo.newBuilder()
>>>                   .setExecutorId(ExecutorID.newBuilder().setValue(details.getId()))
>>>                   .setData(ByteString.copyFromUtf8(executorDataStr))
>>>                   .setContainer(ContainerInfo.newBuilder()
>>>                           .setType(ContainerInfo.Type.DOCKER)
>>>                           .setDocker(ContainerInfo.DockerInfo.newBuilder()
>>>                                           .setImage("mesos-storm”)))
>>>                   .setCommand(CommandInfo.newBuilder().setShell(true).setValue("storm
supervisor storm.mesos.MesosSupervisor"))
>>>       //rest is unchanged from existing mesos-storm framework code
>>> 
>>> The executor launches and exits quickly - see the log msg:  Executor for container
'88ce3658-7d9c-4b5f-b69a-cb5e48125dfd' has exited
>>> 
>>> It seems like mesos loses track of the executor? I understand there is a 1 min
timeout on registering the executor, but the exit happens well before 1 minute.
>>> 
>>> I tried a few alternate commands to experiment, and I can see in the stdout for
the task that
>>> "echo testing123 && echo testing456” 
>>> prints to stdout correctly, both testing123 and testing456
>>> 
>>> however:
>>> "echo testing123a && sleep 10 && echo testing456a” 
>>> prints only testing123a, presumably because the container is lost and destroyed
before the sleep time is up.
>>> 
>>> So it’s like the container for the executor is only allowed to run for .5 seconds,
then it is detected as exited, and the task is lost. 
>>> 
>>> Thanks for any advice.
>>> 
>>> Tyson
>>> 
>>> 
>>> 
>>> slave logs look like:
>>> mesosslave_1  | I0417 19:07:27.461230    11 slave.cpp:1121] Got assigned task
mesos-slave1.service.consul-31000 for framework 20150417-190611-2801799596-5050-1-0000
>>> mesosslave_1  | I0417 19:07:27.461479    11 slave.cpp:1231] Launching task mesos-slave1.service.consul-31000
for framework 20150417-190611-2801799596-5050-1-0000
>>> mesosslave_1  | I0417 19:07:27.463250    11 slave.cpp:4160] Launching executor
insights-1-1429297638 of framework 20150417-190611-2801799596-5050-1-0000 in work directory
'/tmp/mesos/slaves/20150417-190611-2801799596-5050-1-S0/frameworks/20150417-190611-2801799596-5050-1-0000/executors/insights-1-1429297638/runs/6539127f-9dbb-425b-86a8-845b748f0cd3'
>>> mesosslave_1  | I0417 19:07:27.463444    11 slave.cpp:1378] Queuing task 'mesos-slave1.service.consul-31000'
for executor insights-1-1429297638 of framework '20150417-190611-2801799596-5050-1-0000
>>> mesosslave_1  | I0417 19:07:27.467200     7 docker.cpp:755] Starting container
'6539127f-9dbb-425b-86a8-845b748f0cd3' for executor 'insights-1-1429297638' and framework
'20150417-190611-2801799596-5050-1-0000'
>>> mesosslave_1  | I0417 19:07:27.985935     7 docker.cpp:1333] Executor for container
'6539127f-9dbb-425b-86a8-845b748f0cd3' has exited
>>> mesosslave_1  | I0417 19:07:27.986359     7 docker.cpp:1159] Destroying container
'6539127f-9dbb-425b-86a8-845b748f0cd3'
>>> mesosslave_1  | I0417 19:07:27.986021     9 slave.cpp:3135] Monitoring executor
'insights-1-1429297638' of framework '20150417-190611-2801799596-5050-1-0000' in container
'6539127f-9dbb-425b-86a8-845b748f0cd3'
>>> mesosslave_1  | I0417 19:07:27.986464     7 docker.cpp:1248] Running docker stop
on container '6539127f-9dbb-425b-86a8-845b748f0cd3'
>>> mesosslave_1  | I0417 19:07:28.286761    10 slave.cpp:3186] Executor 'insights-1-1429297638'
of framework 20150417-190611-2801799596-5050-1-0000 has terminated with unknown status
>>> mesosslave_1  | I0417 19:07:28.288784    10 slave.cpp:2508] Handling status update
TASK_LOST (UUID: 0795a58b-f487-42e2-aaa1-a26fe6834ed7) for task mesos-slave1.service.consul-31000
of framework 20150417-190611-2801799596-5050-1-0000 from @0.0.0.0:0
>>> mesosslave_1  | W0417 19:07:28.289227     9 docker.cpp:841] Ignoring updating
unknown container: 6539127f-9dbb-425b-86a8-845b748f0cd3
>>> 
>>> nimbus logs (framework) look like:
>>> 2015-04-17T19:07:28.302+0000 s.m.MesosNimbus [INFO] Received status update: task_id
{
>>> value: "mesos-slave1.service.consul-31000"
>>> }
>>> state: TASK_LOST
>>> message: "Container terminated"
>>> slave_id {
>>> value: "20150417-190611-2801799596-5050-1-S0"
>>> }
>>> timestamp: 1.429297648286981E9
>>> source: SOURCE_SLAVE
>>> reason: REASON_EXECUTOR_TERMINATED
>>> 11: "\a\225\245\213\364\207B\342\252\241\242o\346\203N\327"
> 

Mime
View raw message