mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stéphane Cottin (JIRA) <j...@apache.org>
Subject [jira] [Updated] (MESOS-5893) mesos-executor should adopt and reap orphan child processes
Date Sat, 30 Jul 2016 16:29:20 GMT

     [ https://issues.apache.org/jira/browse/MESOS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stéphane Cottin updated MESOS-5893:
-----------------------------------
    Description: 
mesos containerizer does not properly handle children death.

discovered using marathon-lb, each topology update fork another haproxy,  the old haproxy
process should properly die after its last client connection is terminated, but turn into
a zombie.

{noformat}
 7716 ?        Ssl    0:00  |       \_ mesos-executor --launcher_dir=/usr/libexec/mesos --sandbox_directory=/mnt/mesos/sandbox
--user=root --working_directory=/marathon-lb --rootfs=/mnt/mesos/provisioner/containers/3b381d5c-7490-4dcd-ab4b-81051226075a/backends/overlay/rootfses/a4beacac-2d7e-445b-80c8-a9b4e480c491
 7813 ?        Ss     0:00  |       |   \_ sh -c /marathon-lb/run sse --marathon https://marathon:8443
--auth-credentials user:pass --group 'external' --ssl-certs /certs --max-serv-port-ip-per-task
20050
 7823 ?        S      0:00  |       |   |   \_ /bin/bash /marathon-lb/run sse --marathon https://marathon:8443
--auth-credentials user:pass --group external --ssl-certs /certs --max-serv-port-ip-per-task
20050
 7827 ?        S      0:00  |       |   |       \_ /usr/bin/runsv /marathon-lb/service/haproxy
 7829 ?        S      0:00  |       |   |       |   \_ /bin/bash ./run
 8879 ?        S      0:00  |       |   |       |       \_ sleep 0.5
 7828 ?        Sl     0:00  |       |   |       \_ python3 /marathon-lb/marathon_lb.py --syslog-socket
/dev/null --haproxy-config /marathon-lb/haproxy.cfg --ssl-certs /certs --command sv reload
/marathon-lb/service/haproxy --sse --marathon https://marathon:8443 --auth-credentials user:pass
--group external --max-serv-port-ip-per-task 20050
 7906 ?        Zs     0:00  |       |   \_ [haproxy] <defunct>
 8628 ?        Zs     0:00  |       |   \_ [haproxy] <defunct>
 8722 ?        Ss     0:00  |       |   \_ haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg
-D -sf 144 52
{noformat}

update: mesos-executor should be registered as a subreaper ( http://man7.org/linux/man-pages/man2/prctl.2.html
) and propagate signals. 
code sample: https://github.com/krallin/tini/blob/master/src/tini.c


  was:
mesos containerizer does not properly handle children death.

discovered using marathon-lb, each topology update fork another haproxy,  the old haproxy
process should properly die after its last client connection is terminated, but turn into
a zombie.

{noformat}
 7716 ?        Ssl    0:00  |       \_ mesos-executor --launcher_dir=/usr/libexec/mesos --sandbox_directory=/mnt/mesos/sandbox
--user=root --working_directory=/marathon-lb --rootfs=/mnt/mesos/provisioner/containers/3b381d5c-7490-4dcd-ab4b-81051226075a/backends/overlay/rootfses/a4beacac-2d7e-445b-80c8-a9b4e480c491
 7813 ?        Ss     0:00  |       |   \_ sh -c /marathon-lb/run sse --marathon https://marathon:8443
--auth-credentials user:pass --group 'external' --ssl-certs /certs --max-serv-port-ip-per-task
20050
 7823 ?        S      0:00  |       |   |   \_ /bin/bash /marathon-lb/run sse --marathon https://marathon:8443
--auth-credentials user:pass --group external --ssl-certs /certs --max-serv-port-ip-per-task
20050
 7827 ?        S      0:00  |       |   |       \_ /usr/bin/runsv /marathon-lb/service/haproxy
 7829 ?        S      0:00  |       |   |       |   \_ /bin/bash ./run
 8879 ?        S      0:00  |       |   |       |       \_ sleep 0.5
 7828 ?        Sl     0:00  |       |   |       \_ python3 /marathon-lb/marathon_lb.py --syslog-socket
/dev/null --haproxy-config /marathon-lb/haproxy.cfg --ssl-certs /certs --command sv reload
/marathon-lb/service/haproxy --sse --marathon https://marathon:8443 --auth-credentials user:pass
--group external --max-serv-port-ip-per-task 20050
 7906 ?        Zs     0:00  |       |   \_ [haproxy] <defunct>
 8628 ?        Zs     0:00  |       |   \_ [haproxy] <defunct>
 8722 ?        Ss     0:00  |       |   \_ haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg
-D -sf 144 52
{noformat}




        Summary: mesos-executor should adopt and reap orphan child processes  (was: mesos-executor
terminated forked children turn to zombies)

> mesos-executor should adopt and reap orphan child processes
> -----------------------------------------------------------
>
>                 Key: MESOS-5893
>                 URL: https://issues.apache.org/jira/browse/MESOS-5893
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 1.1.0
>         Environment: mesos compiled from git master ( 1.1.0 ) 
> {{../configure --enable-ssl --enable-libevent --prefix=/usr --enable-optimize --enable-silent-rules
--enable-xfs-disk-isolator}}
> isolators : {{namespaces/pid,cgroups/cpu,cgroups/mem,filesystem/linux,docker/runtime,network/cni,docker/volume}}
>            Reporter: Stéphane Cottin
>              Labels: containerizer
>
> mesos containerizer does not properly handle children death.
> discovered using marathon-lb, each topology update fork another haproxy,  the old haproxy
process should properly die after its last client connection is terminated, but turn into
a zombie.
> {noformat}
>  7716 ?        Ssl    0:00  |       \_ mesos-executor --launcher_dir=/usr/libexec/mesos
--sandbox_directory=/mnt/mesos/sandbox --user=root --working_directory=/marathon-lb --rootfs=/mnt/mesos/provisioner/containers/3b381d5c-7490-4dcd-ab4b-81051226075a/backends/overlay/rootfses/a4beacac-2d7e-445b-80c8-a9b4e480c491
>  7813 ?        Ss     0:00  |       |   \_ sh -c /marathon-lb/run sse --marathon https://marathon:8443
--auth-credentials user:pass --group 'external' --ssl-certs /certs --max-serv-port-ip-per-task
20050
>  7823 ?        S      0:00  |       |   |   \_ /bin/bash /marathon-lb/run sse --marathon
https://marathon:8443 --auth-credentials user:pass --group external --ssl-certs /certs --max-serv-port-ip-per-task
20050
>  7827 ?        S      0:00  |       |   |       \_ /usr/bin/runsv /marathon-lb/service/haproxy
>  7829 ?        S      0:00  |       |   |       |   \_ /bin/bash ./run
>  8879 ?        S      0:00  |       |   |       |       \_ sleep 0.5
>  7828 ?        Sl     0:00  |       |   |       \_ python3 /marathon-lb/marathon_lb.py
--syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg --ssl-certs /certs --command
sv reload /marathon-lb/service/haproxy --sse --marathon https://marathon:8443 --auth-credentials
user:pass --group external --max-serv-port-ip-per-task 20050
>  7906 ?        Zs     0:00  |       |   \_ [haproxy] <defunct>
>  8628 ?        Zs     0:00  |       |   \_ [haproxy] <defunct>
>  8722 ?        Ss     0:00  |       |   \_ haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg
-D -sf 144 52
> {noformat}
> update: mesos-executor should be registered as a subreaper ( http://man7.org/linux/man-pages/man2/prctl.2.html
) and propagate signals. 
> code sample: https://github.com/krallin/tini/blob/master/src/tini.c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message