mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaodong Zhang <xdzh...@alauda.io>
Subject Re: Running mesos-slave in the docker that leave many zombie process
Date Wed, 22 Feb 2017 18:32:20 GMT
Hi guys,

About this issue.

What can I do, or are there any other info I can offer?

Thanks,
Xiaodong

发件人: Xiaodong Zhang <xdzhang@alauda.io<mailto:xdzhang@alauda.io>>
答复: "user@mesos.apache.org<mailto:user@mesos.apache.org>" <user@mesos.apache.org<mailto:user@mesos.apache.org>>
日期: 2017年2月21日 星期二 下午6:18
至: "user@mesos.apache.org<mailto:user@mesos.apache.org>" <user@mesos.apache.org<mailto:user@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi @Haosdent thanks for your reply.
I tried 1.0.3, 1.1.0. They both have the same problem.

  1.  Create container.

[cid:89116669-04F7-4B75-A1C5-9067FC6CE59D]

  2. Restart container. Works well.

  3. Remove executor
     [cid:2F546391-5FD8-4676-ADE3-35F29B8F7326]

If I restart mesos-slave. Then the zombie container gone.

Any thoughts?

Thanks,
Xiaodong

发件人: haosdent <haosdent@gmail.com<mailto:haosdent@gmail.com>>
答复: "user@mesos.apache.org<mailto:user@mesos.apache.org>" <user@mesos.apache.org<mailto:user@mesos.apache.org>>
日期: 2017年2月21日 星期二 上午1:19
至: user <user@mesos.apache.org<mailto:user@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi, @xiaodong May you try if this problem still exists after 1.0? I remember Mesos change
the recovery for docker containers to avoid this after 1.0.

On Tue, Feb 21, 2017 at 1:13 AM, Xiaodong Zhang <xdzhang@alauda.io<mailto:xdzhang@alauda.io>>
wrote:
Hi guys.

I try to fix zombie container as this email. It works well when I restart mesos-slave. No
zombie containers occur. But this  just works on  restarting mesos-slave.

If I restart the executor, the executor will quit, and the container which executor start,
will be a zombie container.

Any idea about this? My mesos version is 0.28.

Here is some pic:


  1.  Start a container.

[cid:C1C8B5B9-DE97-486B-9512-117DCD305B3F]

2. Restart mesos-slave. Everything is ok.
[cid:41C46ACA-5234-418E-8ACC-B1FB50533AF4]

3. Kill the executor container, zombie container occur.

[cid:B6685AC2-E65B-4F16-AC3B-7C4636C96D10]

How can I fix this?

Thanks,
Xiaodong

发件人: tommy xiao <xiaods@gmail.com<mailto:xiaods@gmail.com>>
答复: "user@mesos.apache.org<mailto:user@mesos.apache.org>" <user@mesos.apache.org<mailto:user@mesos.apache.org>>
日期: 2016年11月22日 星期二 上午12:32
至: user <user@mesos.apache.org<mailto:user@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

you need it  --pid=host

2016-11-21 15:01 GMT+08:00 X Brick <ngdocker@gmail.com<mailto:ngdocker@gmail.com>>:
Thanks @haosdent, let me try it.

2016-11-21 14:33 GMT+08:00 haosdent <haosdent@gmail.com<mailto:haosdent@gmail.com>>:
Pass the `--pid=host` flag when starting the docker container  may resolve this.
>start the mesos_slave container with "--pid=host" so that it uses the process namespace
of the host.

On Mon, Nov 21, 2016 at 2:30 PM, haosdent <haosdent@gmail.com<mailto:haosdent@gmail.com>>
wrote:
No sure if it related to this issue https://github.com/mesosphere/docker-containers/issues/9

On Mon, Nov 21, 2016 at 12:27 PM, X Brick <ngdocker@gmail.com<mailto:ngdocker@gmail.com>>
wrote:

Hi,

I meet a problem when running mesos-slave in the docker. Here are some zombie process in this
way.

```
root     10547 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     14505 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     16069 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     19962 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     23346 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     24544 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
```

And I find the zombies come from mesos-slave process:

```
pstree -p -s 10547
systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
```

The logs has been deleted by the cron job a few weeks ago, but I remember so many `Failed
to shutdown socket with fd xx: Transport endpoint is not connected` in the log.

I report this to the JIRA: https://issues.apache.org/jira/browse/MESOS-6615

Is there anyone saw this issue before ?



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang




--
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com<http://gmail.com>



--
Best Regards,
Haosdent Huang
Mime
View raw message