mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Grove <andy.gr...@codefutures.com>
Subject Re: Docker executor issue
Date Tue, 30 Sep 2014 16:20:47 GMT
Hi Tim,

Thanks for helping with this. I am running mesos-master and mesos-slave
natively on the same host (my desktop). The only container in use is the
one being launched by the mesos-slave.

I will try your suggestion of running a simple command next.

Here is the output from the slave from this issue though:

I0930 10:13:52.053177 30722 main.cpp:126] Build: 2014-09-29 15:35:37 by andy
I0930 10:13:52.053228 30722 main.cpp:128] Version: 0.20.1
I0930 10:13:53.055480 30722 containerizer.cpp:89] Using isolation:
posix/cpu,posix/mem
I0930 10:13:53.058353 30722 main.cpp:149] Starting Mesos slave
I0930 10:13:53.059651 30722 slave.cpp:167] Slave started on 1)@
127.0.1.1:5051
I0930 10:13:53.060072 30722 slave.cpp:278] Slave resources: cpus(*):8;
mem(*):14963; disk(*):1.85648e+06; ports(*):[31000-32000]
I0930 10:13:53.060226 30722 slave.cpp:306] Slave hostname: davros
I0930 10:13:53.060253 30722 slave.cpp:307] Slave checkpoint: true
I0930 10:13:53.064975 30729 state.cpp:33] Recovering state from
'/tmp/mesos/meta'
I0930 10:13:53.065352 30725 status_update_manager.cpp:193] Recovering
status update manager
I0930 10:13:53.065626 30729 docker.cpp:577] Recovering Docker containers
I0930 10:13:53.065690 30724 containerizer.cpp:252] Recovering containerizer
I0930 10:13:54.055233 30723 slave.cpp:3198] Finished recovery
I0930 10:13:54.055448 30723 slave.cpp:589] New master detected at
master@127.0.0.1:5050
I0930 10:13:54.055532 30723 slave.cpp:625] No credentials provided.
Attempting to register without authentication
I0930 10:13:54.055537 30730 status_update_manager.cpp:167] New master
detected at master@127.0.0.1:5050
I0930 10:13:54.055552 30723 slave.cpp:636] Detecting new master
I0930 10:13:54.928225 30724 slave.cpp:754] Registered with master
master@127.0.0.1:5050; given slave ID 20140930-101303-16777343-5050-30690-0
I0930 10:13:54.928598 30724 slave.cpp:767] Checkpointing SlaveInfo to
'/tmp/mesos/meta/slaves/20140930-101303-16777343-5050-30690-0/slave.info'
I0930 10:14:17.330390 30725 slave.cpp:1002] Got assigned task 0 for
framework 20140930-101303-16777343-5050-30690-0000
I0930 10:14:17.330557 30725 slave.cpp:1112] Launching task 0 for framework
20140930-101303-16777343-5050-30690-0000
I0930 10:14:17.331296 30725 slave.cpp:1222] Queuing task '0' for executor
default of framework '20140930-101303-16777343-5050-30690-0000
*I0930 10:14:17.333109 30730 docker.cpp:984] Starting container
'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' for executor 'default' and framework
'20140930-101303-16777343-5050-30690-0000'*
I0930 10:14:20.062705 30730 slave.cpp:2538] Monitoring executor 'default'
of framework '20140930-101303-16777343-5050-30690-0000' in container
'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81'

The container is running quite happily at this point.

I0930 10:14:53.061337 30724 slave.cpp:3053] Current usage 0.76%. Max
allowed age: 6.247043850997720days
*I0930 10:15:17.331712 30730 slave.cpp:3010] Terminating executor default
of framework 20140930-101303-16777343-5050-30690-0000 because it did not
register within 1mins*
I0930 10:15:17.332221 30728 docker.cpp:1473] Destroying container
'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81'
I0930 10:15:17.332308 30728 docker.cpp:1568] Running docker kill on
container 'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81'
I0930 10:15:18.109361 30730 docker.cpp:1646] Executor for container
'ebb1dca6-cc9d-427f-8faa-f3f723f6ab81' has exited


Thanks,

Andy.

--
Andy Grove
VP Engineering
CodeFutures Corporation



On Mon, Sep 29, 2014 at 6:25 PM, Tim Chen <tim@mesosphere.io> wrote:

> Hi Andy,
>
> You don't need to specifiy -d as the docker containerizer will set it for
> you since we run all docker images detached.
>
> It seems like the executor just simply can't register with the slave. Can
> you try just running a simple command without Docker that takes longer than
> the executor registration timeout to see if you see the same error?
>
> Also do you run the mesos slave in a docker container as well?
>
> Will be great if you can share the slave log as Vinod suggested too.
>
> Tim
>
>
>
>
>
>
> On Mon, Sep 29, 2014 at 5:15 PM, Vinod Kone <vinodkone@gmail.com> wrote:
>
>> I'll let Tim Chen help you out here since he has more context. Some slave
>> logs around the failed container launch would be helpful.
>>
>>
>> On Mon, Sep 29, 2014 at 5:03 PM, Andy Grove <andy.grove@codefutures.com>
>> wrote:
>>
>>> Ignore my comment about docker run not returning. That is incorrect.
>>>
>>> Thanks,
>>>
>>> Andy.
>>>
>>> --
>>> Andy Grove
>>> VP Engineering
>>> CodeFutures Corporation
>>>
>>>
>>>
>>> On Mon, Sep 29, 2014 at 5:59 PM, Andy Grove <andy.grove@codefutures.com>
>>> wrote:
>>>
>>>> Hi Vinod,
>>>>
>>>> Thanks for the quick response but the image is already on the slave and
>>>> I see the container being launched almost immediately when my framework
>>>> starts (within 1-2 seconds). If I keep running docker ps, this is the last
>>>> output I see before the container is killed:
>>>>
>>>> $ docker ps
>>>> CONTAINER ID        IMAGE                                   COMMAND
>>>>            CREATED             STATUS              PORTS               NAMES
>>>> 45f992c2781f        codefutures/dbshards_zookeeper:latest   "/bin/sh -c
>>>> '/opt/zo   59 seconds ago      Up 58 seconds
>>>>
>>>> I am using mesos 0.20.1 and docker 1.2.0 on Ubuntu 14.04.
>>>>
>>>> So the container is running fine. It is a long running service i.e. the
>>>> docker run command will never return. Should I be providing some option so
>>>> that the docker executor passed the -d flag to the docker run command? I
>>>> guess I should start looking through the mesos source so I can see how this
>>>> works.
>>>>
>>>> Thanks,
>>>>
>>>> Andy.
>>>>
>>>> --
>>>> Andy Grove
>>>> VP Engineering
>>>> CodeFutures Corporation
>>>>
>>>>
>>>>
>>>> On Mon, Sep 29, 2014 at 5:49 PM, Vinod Kone <vinodkone@gmail.com>
>>>> wrote:
>>>>
>>>>> Trying increasing the executor registration timeout on the slave
>>>>> (--executor_registration_timeout) to give docker more time to do a pull
of
>>>>> the image.
>>>>>
>>>>> On Mon, Sep 29, 2014 at 4:41 PM, Andy Grove <
>>>>> andy.grove@codefutures.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've working on a prototype Mesos framework to launch docker
>>>>>> containers. I'm getting as far as seeing my container start up but
after
>>>>>> one minute if gets killed due to:
>>>>>>
>>>>>> Terminating executor default of framework
>>>>>> 20140929-155916-16777343-5050-2708-0004 because it did not register
within
>>>>>> 1mins
>>>>>>
>>>>>> Here is the code I am using in my scheduler, which was based on one
>>>>>> of the examples:
>>>>>>
>>>>>>   @Override
>>>>>>   public void resourceOffers(SchedulerDriver schedulerDriver,
>>>>>> List<Protos.Offer> offers) {
>>>>>>     logger.info("resourceOffers() with {} offers", offers.size());
>>>>>>
>>>>>>     for (Protos.Offer offer : offers) {
>>>>>>
>>>>>>       List<Protos.TaskInfo> tasks = new ArrayList<Protos.TaskInfo>();
>>>>>>       if (launchedTasks < totalTasks) {
>>>>>>         Protos.TaskID taskId = Protos.TaskID.newBuilder()
>>>>>>             .setValue(Integer.toString(launchedTasks++)).build();
>>>>>>
>>>>>>         logger.info("Launching task " + taskId.getValue());
>>>>>>
>>>>>>         // docker image info
>>>>>>         Protos.ContainerInfo.DockerInfo.Builder dockerInfoBuilder
=
>>>>>> Protos.ContainerInfo.DockerInfo.newBuilder();
>>>>>>         dockerInfoBuilder.setImage("codefutures/dbshards_zookeeper");
>>>>>>
>>>>>>         // container info
>>>>>>         Protos.ContainerInfo.Builder containerInfoBuilder =
>>>>>> Protos.ContainerInfo.newBuilder();
>>>>>>
>>>>>> containerInfoBuilder.setType(Protos.ContainerInfo.Type.DOCKER);
>>>>>>         containerInfoBuilder.setDocker(dockerInfoBuilder.build());
>>>>>>
>>>>>>         // create executor for the container
>>>>>>         Protos.ExecutorInfo executor =
>>>>>> Protos.ExecutorInfo.newBuilder()
>>>>>>
>>>>>> .setExecutorId(Protos.ExecutorID.newBuilder().setValue("default"))
>>>>>>
>>>>>> .setCommand(Protos.CommandInfo.newBuilder().setShell(false))
>>>>>>             .setContainer(containerInfoBuilder)
>>>>>>             .setName("Test Executor (Docker)")
>>>>>>             .setSource("docker_test")
>>>>>>             .build();
>>>>>>
>>>>>>         // create task to run
>>>>>>         Protos.TaskInfo task = Protos.TaskInfo.newBuilder()
>>>>>>             .setName("task " + taskId.getValue())
>>>>>>             .setTaskId(taskId)
>>>>>>             .setSlaveId(offer.getSlaveId())
>>>>>>             .addResources(Protos.Resource.newBuilder()
>>>>>>                 .setName("cpus")
>>>>>>                 .setType(Protos.Value.Type.SCALAR)
>>>>>>
>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(1)))
>>>>>>             .addResources(Protos.Resource.newBuilder()
>>>>>>                 .setName("mem")
>>>>>>                 .setType(Protos.Value.Type.SCALAR)
>>>>>>
>>>>>> .setScalar(Protos.Value.Scalar.newBuilder().setValue(128)))
>>>>>>             .setExecutor(Protos.ExecutorInfo.newBuilder(executor))
>>>>>>             .build();
>>>>>>
>>>>>>         tasks.add(task);
>>>>>>       }
>>>>>>       Protos.Filters filters =
>>>>>> Protos.Filters.newBuilder().setRefuseSeconds(1).build();
>>>>>>
>>>>>>       schedulerDriver.launchTasks(offer.getId(), tasks, filters);
>>>>>>     }
>>>>>>
>>>>>>   }
>>>>>>
>>>>>> Am I missing some steps with this approach?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Andy.
>>>>>>
>>>>>> --
>>>>>> Andy Grove
>>>>>> VP Engineering
>>>>>> CodeFutures Corporation
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message