mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Chen <...@mesosphere.io>
Subject Re: Can't start docker container when SSL_ENABLED is on.
Date Thu, 29 Oct 2015 05:48:32 GMT
Does running a task without docker container (Mesos containerizer) works
with ssl in your environment?

Tim

On Wed, Oct 28, 2015 at 10:19 PM, Xiaodong Zhang <xdzhang@alauda.io> wrote:

> Thanks a lot. I find the log file in slave.
>
> One of the task:
>
> Stdout:
>
> --container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444"
> --docker="/home/ubuntu/luna/bin/docker" --help="false"
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444"
> --stop_timeout="0ns"
> --container="mesos-20151029-043755-3549436724-5050-5674-S0.e2c2580f-8082-4f17-b0cc-4e32e040d444"
> --docker="/home/ubuntu/luna/bin/docker" --help="false"
> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
> --sandbox_directory="/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444"
> --stop_timeout="0ns"
> Shutting down
>
> Stderr:
>
> I1029 05:14:06.529364 27862 fetcher.cpp:414] Fetcher Info:
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20151029-043755-3549436724-5050-5674-S0","items":[{"action":"BYPASS_CACHE","uri":{"extract":false,"value":"file:\/\/\/etc\/.dockercfg"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20151029-043755-3549436724-5050-5674-S0\/frameworks\/20151029-043755-3549436724-5050-5674-0000\/executors\/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f\/runs\/e2c2580f-8082-4f17-b0cc-4e32e040d444"}
> I1029 05:14:06.530562 27862 fetcher.cpp:369] Fetching URI
> 'file:///etc/.dockercfg'
> I1029 05:14:06.530580 27862 fetcher.cpp:243] Fetching directly into the
> sandbox directory
> I1029 05:14:06.530594 27862 fetcher.cpp:180] Fetching URI
> 'file:///etc/.dockercfg'
> I1029 05:14:06.530609 27862 fetcher.cpp:160] Copying resource with
> command:cp '/etc/.dockercfg'
> '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
> I1029 05:14:06.532165 27862 fetcher.cpp:446] Fetched
> 'file:///etc/.dockercfg' to
> '/tmp/mesos/slaves/20151029-043755-3549436724-5050-5674-S0/frameworks/20151029-043755-3549436724-5050-5674-0000/executors/e4a3bed5-64e6-4970-8bb1-df6404656a48.e3a20f3b-7dfb-11e5-b57b-0247b493b22f/runs/e2c2580f-8082-4f17-b0cc-4e32e040d444/.dockercfg'
> I1029 05:14:07.782054 27955 exec.cpp:133] Version: 0.24.1
> I1029 05:14:07.785039 27963 exec.cpp:462] Slave exited ... shutting down
> E1029 05:14:07.785158 27964 socket.hpp:174] Shutdown failed on fd=7:
> Transport endpoint is not connected [107]
>
> 发件人: haosdent <haosdent@gmail.com>
> 答复: "user@mesos.apache.org" <user@mesos.apache.org>
> 日期: 2015年10月29日 星期四 下午1:13
>
> 至: user <user@mesos.apache.org>
> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>
>
>
> ​
> I capture how I find tasks log in my local webui, could you find the
> stderr and stdout for your tasks according above screenshots?
> ​
>
> On Thu, Oct 29, 2015 at 1:07 PM, Xiaodong Zhang <xdzhang@alauda.io> wrote:
>
>> I didn’t see some useful info.
>>
>> In mesos slave log, there is a line :
>> I1029 03:29:53.160143  9292 slave.cpp:3399] Executor
>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713'
>> of framework 20151029-031549-1294671788-5050-4937-0000 terminated with
>> signal Killed
>>
>> I check the normal log, it shows:
>>
>> I1014 15:22:21.276007 23163 slave.cpp:3326] Executor
>> 'ffc08dce-997f-41f7-9b03-57c1b4bc1f85.47ed02aa-7285-11e5-80d7-000d3a8033de'
>> of framework 20150814-115157-1677721866-5050-6185-0000 exited with
>> status 0
>>
>> Is this helpful?
>>
>> 发件人: Xiaodong Zhang <xdzhang@alauda.io>
>> 答复: "user@mesos.apache.org" <user@mesos.apache.org>
>> 日期: 2015年10月29日 星期四 下午12:59
>> 至: "user@mesos.apache.org" <user@mesos.apache.org>
>>
>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>
>>
>> The webui have a LOG link, when click it shows like this:
>>
>> I1029 04:44:32.293445  5697 http.cpp:321] HTTP GET for /master/state.json
>> from 114.113.20.135:55682 with User-Agent='Mozilla/5.0 (Macintosh; Intel
>> Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko)
>> Chrome/46.0.2490.71 Safari/537.36'
>> I1029 04:44:34.533504  5704 master.cpp:4613] Sending 1 offers to
>> framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>> I1029 04:44:34.539579  5702 master.cpp:2739] Processing ACCEPT call for
>> offers: [ 20151029-043755-3549436724-5050-5674-O2 ] on slave
>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051 (
>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com) for framework
>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>> I1029 04:44:34.539710  5702 hierarchical.hpp:814] Recovered cpus(*):1;
>> mem(*):999; disk(*):3962; ports(*):[31000-32000] (total: cpus(*):1;
>> mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: ) on slave
>> 20151029-043755-3549436724-5050-5674-S0 from framework
>> 20151029-043755-3549436724-5050-5674-0000
>> I1029 04:44:37.360901  5703 master.cpp:4294] Performing implicit task
>> state reconciliation for framework
>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>> I1029 04:44:40.539989  5704 master.cpp:4613] Sending 1 offers to
>> framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>> I1029 04:44:40.610321  5702 master.cpp:2739] Processing ACCEPT call for
>> offers: [ 20151029-043755-3549436724-5050-5674-O3 ] on slave
>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051 (
>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com) for framework
>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>> I1029 04:44:40.610846  5702 master.hpp:170] Adding task
>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>> with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave
>> 20151029-043755-3549436724-5050-5674-S0 (
>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>> I1029 04:44:40.610911  5702 master.cpp:3069] Launching task
>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>> of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373 with
>> resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] on slave
>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051 (
>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>> I1029 04:44:40.611095  5702 hierarchical.hpp:814] Recovered
>> cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863,
>> 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962;
>> ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256;
>> ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0
>> from framework 20151029-043755-3549436724-5050-5674-0000
>> I1029 04:44:43.324970  5698 http.cpp:321] HTTP GET for /master/state.json
>> from 114.113.20.135:55682 with User-Agent='Mozilla/5.0 (Macintosh; Intel
>> Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko)
>> Chrome/46.0.2490.71 Safari/537.36'
>> I1029 04:44:46.546671  5703 master.cpp:4613] Sending 1 offers to
>> framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>> I1029 04:44:46.557266  5699 master.cpp:2739] Processing ACCEPT call for
>> offers: [ 20151029-043755-3549436724-5050-5674-O4 ] on slave
>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051 (
>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com) for framework
>> 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373
>> I1029 04:44:46.557394  5699 hierarchical.hpp:814] Recovered
>> cpus(*):0.9375; mem(*):743; disk(*):3962; ports(*):[31000-31863,
>> 31865-32000] (total: cpus(*):1; mem(*):999; disk(*):3962;
>> ports(*):[31000-32000], allocated: cpus(*):0.0625; mem(*):256;
>> ports(*):[31864-31864]) on slave 20151029-043755-3549436724-5050-5674-S0
>> from framework 20151029-043755-3549436724-5050-5674-0000
>> I1029 04:44:47.267562  5700 master.cpp:4069] Status update TASK_FAILED
>> (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task
>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>> of framework 20151029-043755-3549436724-5050-5674-0000 from slave
>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051 (
>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>> I1029 04:44:47.267645  5700 master.cpp:4108] Forwarding status update
>> TASK_FAILED (UUID: 0ea607fc-bf24-4bda-b107-55a54aba31cf) for task
>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>> of framework 20151029-043755-3549436724-5050-5674-0000
>> I1029 04:44:47.267774  5700 master.cpp:5576] Updating the latest state of
>> task
>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>> of framework 20151029-043755-3549436724-5050-5674-0000 to TASK_FAILED
>> I1029 04:44:47.267907  5700 hierarchical.hpp:814] Recovered
>> cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] (total: cpus(*):1;
>> mem(*):999; disk(*):3962; ports(*):[31000-32000], allocated: ) on slave
>> 20151029-043755-3549436724-5050-5674-S0 from framework
>> 20151029-043755-3549436724-5050-5674-0000
>> I1029 04:44:47.289356  5698 master.cpp:5644] Removing task
>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>> with resources cpus(*):0.0625; mem(*):256; ports(*):[31864-31864] of
>> framework 20151029-043755-3549436724-5050-5674-0000 on slave
>> 20151029-043755-3549436724-5050-5674-S0 at slave(1)@50.112.136.148:5051 (
>> ec2-50-112-136-148.us-west-2.compute.amazonaws.com)
>> I1029 04:44:47.289459  5698 master.cpp:3398] Processing ACKNOWLEDGE call
>> 0ea607fc-bf24-4bda-b107-55a54aba31cf for task
>> e4a3bed5-64e6-4970-8bb1-df6404656a48.c4239b84-7df7-11e5-b57b-0247b493b22f
>> of framework 20151029-043755-3549436724-5050-5674-0000 (marathon) at
>> scheduler-b532233f-2fc5-4455-b1e6-7a66ae79a8b9@172.31.43.77:53373 on
>> slave 20151029-043755-3549436724-5050-5674-S0
>>
>>
>>
>> 发件人: haosdent <haosdent@gmail.com>
>> 答复: "user@mesos.apache.org" <user@mesos.apache.org>
>> 日期: 2015年10月29日 星期四 下午12:02
>> 至: user <user@mesos.apache.org>
>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>
>> Oh, I mean you task logs. They could be get from Mesos webui.
>>
>> On Thu, Oct 29, 2015 at 11:52 AM, Xiaodong Zhang <xdzhang@alauda.io>
>> wrote:
>>
>>> Thanks for your reply.
>>>
>>> Yes I build mesos with `--enable-libevent --enable-ssl`. If I don’t
>>> provide key and pem when start slave, it will register fail(That means the
>>> ssl work well right?)
>>>
>>> As I said the odd thing is the container nerver run(`docker ps –a show
>>> nothing`). So it can’t have any stdout or stderr.
>>>
>>> 发件人: haosdent <haosdent@gmail.com>
>>> 答复: "user@mesos.apache.org" <user@mesos.apache.org>
>>> 日期: 2015年10月29日 星期四 上午11:47
>>> 至: user <user@mesos.apache.org>
>>> 主题: Re: Can't start docker container when SSL_ENABLED is on.
>>>
>>> Do you compile mesos with ssl support? The default compile don't
>>> contains ssl. And does docker container have stdour and stderr?
>>>
>>> On Thu, Oct 29, 2015 at 11:41 AM, Xiaodong Zhang <xdzhang@alauda.io>
>>> wrote:
>>>
>>>> My scenarios is like previous email says, masters and slaves are in
>>>> different IaaS. Now the slaves can register to the masters with SSL_ENABLED
>>>> is on .
>>>>
>>>> But I meet another problem. Slaves can’t run container(the odd thing is
>>>> they can pull image successfully,just can not run container, `docker ps –a
>>>> ` list nothing)
>>>>
>>>> The logs like this:
>>>>
>>>> I1029 03:29:45.967741  9288 docker.cpp:758] Starting container
>>>> 'd4f4e236-0d0a-492c-86df-eef48a414e23' for task
>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713'
>>>> (and executor
>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713')
>>>> of framework '20151029-031549-1294671788-5050-4937-0000'
>>>> I1029 03:29:48.044148  9292 docker.cpp:382] Checkpointing pid 12062 to
>>>> '/tmp/mesos/meta/slaves/20151029-031549-1294671788-5050-4937-S0/frameworks/20151029-031549-1294671788-5050-4937-0000/executors/279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713/runs/d4f4e236-0d0a-492c-86df-eef48a414e23/pids/forked.pid'
>>>> I1029 03:29:53.159361  9292 docker.cpp:1576] Executor for container
>>>> 'd4f4e236-0d0a-492c-86df-eef48a414e23' has exited
>>>> I1029 03:29:53.159572  9292 docker.cpp:1374] Destroying container
>>>> 'd4f4e236-0d0a-492c-86df-eef48a414e23'
>>>> I1029 03:29:53.159822  9292 docker.cpp:1478] Running docker stop on
>>>> container 'd4f4e236-0d0a-492c-86df-eef48a414e23'
>>>> I1029 03:29:53.160143  9292 slave.cpp:3399] Executor
>>>> '279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713'
>>>> of framework 20151029-031549-1294671788-5050-4937-0000 terminated with
>>>> signal Killed
>>>> I1029 03:29:53.160884  9292 slave.cpp:2696] Handling status update
>>>> TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for task
>>>> 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713
>>>> of framework 20151029-031549-1294671788-5050-4937-0000 from @0.0.0.0:0
>>>> W1029 03:29:53.161247  9288 docker.cpp:986] Ignoring updating unknown
>>>> container: d4f4e236-0d0a-492c-86df-eef48a414e23
>>>> I1029 03:29:53.161548  9293 status_update_manager.cpp:322] Received
>>>> status update TASK_FAILED (UUID: 27a2080a-8807-449e-9077-837ec45b4c51) for
>>>> task
>>>> 279bcb34-f705-4857-96ad-d96843b848fb.4b3abdcd-7ded-11e5-a82d-0240afabf713
>>>> of framework 20151029-031549-1294671788-5050-4937-0000
>>>>
>>>> I run master node with env:
>>>>
>>>> SSL_SUPPORT_DOWNGRADE=true
>>>> SSL_ENABLED=true
>>>> SSL_KEY_FILE=/home/ubuntu/xx.key
>>>> SSL_CERT_FILE=/home/ubuntu/xx.pem
>>>>
>>>> Slave node with env:
>>>>
>>>> SSL_ENABLED=true
>>>> SSL_KEY_FILE=/home/ubuntu/xx.key
>>>> SSL_CERT_FILE=/home/ubuntu/xx.pem
>>>> LIBPROCESS_ADVERTISE_IP=xxx.xxx.xxx.xxx
>>>>
>>>> When I remove all SSL envs. Slaves work well.
>>>>
>>>> Did I miss sth?
>>>>
>>>> Version:
>>>>
>>>> Mesos 0.24.1
>>>> Maraton 0.9.2
>>>>
>>>> OS
>>>> ubuntu 14.04
>>>>
>>>>
>>>>
>>>> 发件人: Anindya Sinha <anindya.sinha@gmail.com>
>>>> 答复: "user@mesos.apache.org" <user@mesos.apache.org>
>>>> 日期: 2015年10月28日 星期三 下午2:32
>>>> 至: "user@mesos.apache.org" <user@mesos.apache.org>
>>>> 主题: Re: How to tell master which ip to connect.
>>>>
>>>>
>>>>
>>>> On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhang <xdzhang@alauda.io>
>>>> wrote:
>>>>
>>>>> It works! Thanks a lot.
>>>>>
>>>>
>>>> Ok. So we should expose advertise_ip and advertise_port as command line
>>>> options for mesos-slave as well (instead of using the environment
>>>> variables)? Opened https://issues.apache.org/jira/browse/MESOS-3809.
>>>>
>>>>
>>>>>
>>>>> Another question. Do masters and slaves communicate each other via a
>>>>> safety way?Is the data encrypted? I want to make sure deploy masters
and
>>>>> slaves into different IaaS is PROD-READY.
>>>>>
>>>>> 发件人: haosdent <haosdent@gmail.com>
>>>>> 答复: "user@mesos.apache.org" <user@mesos.apache.org>
>>>>> 日期: 2015年10月28日 星期三 上午10:23
>>>>> 至: user <user@mesos.apache.org>
>>>>> 主题: Re: How to tell master which ip to connect.
>>>>>
>>>>> Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and
>>>>> `LIBPROCESS_ADVERTISE_PORT` when start slave?
>>>>>
>>>>> On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang <xdzhang@alauda.io>
>>>>> wrote:
>>>>>
>>>>>> Hi teams:
>>>>>>
>>>>>> My scenarios is like this:
>>>>>>
>>>>>> My master nodes were deployed in AWS. My slaves were in AZURE.So
they
>>>>>> communicate via public ip.
>>>>>> I got trouble when slaves try to register to master.
>>>>>> Now slaves can get master’s public ip address,and can send register
>>>>>> request.But they can only send there private ip to master.(Because
they
>>>>>> don’t know there public ip,thus they can’t not bind a public
ip via —ip
>>>>>> flag), thus  masters can’t connect slaves.How can the slave to
tell master
>>>>>> which ip master should connect(I can’t find any flags like —advertise_ip
>>>>>> in master).
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Mime
View raw message