mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qiang Chen <qzsc...@gmail.com>
Subject Re: Failed to shutdown socket with fd xxx
Date Mon, 20 Jun 2016 02:45:16 GMT
Thanks @Haosdent for the link to explain the shutdown errors. so I can 
ignore this...

@Joris,

1. I upgraded form 0.25.0 to 0.28.2 in centos 7 which  has systemd support.
2. I didn't make any OS / init system changes

For "That indicates a transition from the old systemd lack of support to 
the new support. "
 >> lack of what support ? would explain more details, and how to fix 
this? or may have other cause ?

Thanks great again!


On 2016年06月17日 21:31, Joris Van Remoortere wrote:
> Boxbe <https://www.boxbe.com/overview> This message is eligible for 
> Automatic Cleanup! (joris@mesosphere.io) Add cleanup rule 
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3DINo0V0shoF5SDDeFNLmOQcDrkM6vuyhBbTAdJ5Ek4fI%253D%26token%3D5pye7msFkBYF5q0SSLYtlGWaWu8a6Imv%252F0E2lgbtu%252BgVEFau%252BV9i3BQYfTGspspkIaoukz1oy8IOSGPyscO1GfcEZlPEs2k3hUGSvAHO6cSuBmHqxd7TnZwBy5RkAx7yt2on45nEbm4%253D&tc_serial=25796382411&tc_rand=1671551284&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>

> | More info 
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=25796382411&tc_rand=1671551284&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>

>
>
>
> The shutdown errors are not the issue.
> The concerning part is this warning:
>
>     W0615 15:01:43.285518  4182 linux_launcher.cpp:197] Couldn't find
>     pid '42322' in 'mesos_executors.slice'. This can lead to lack of
>     proper resource isolation
>
> That indicates a transition from the old systemd lack of support to 
> the new support.
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Fri, Jun 17, 2016 at 2:35 PM, haosdent <haosdent@gmail.com 
> <mailto:haosdent@gmail.com>> wrote:
>
>     Hi, @Qiang.
>
>     @Joseph have a nice explain about at Shutdown failed on fd
>     http://search-hadoop.com/m/0Vlr6pe7qb2MJX8B1&subj=Re+Benign+Shutdown+failed+on+fd+error+messages
>     Those errors could be ignored.
>
>     For
>     ```
>     I0615 15:01:43.324935  4172 mem.cpp:602] Started listening for OOM
>     events
>     for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
>     ```
>
>     These are normal info log, it happen when Mesos CgroupMemIsolator
>     register
>     oom hooks for your containers.
>
>     On Fri, Jun 17, 2016 at 8:22 PM, Joris Van Remoortere
>     <joris@mesosphere.io <mailto:joris@mesosphere.io>>
>     wrote:
>
>     > Can you provide:
>     > 1. The version that you are upgrading from.
>     > 2. Whether you made any OS / init system changes alongside this
>     upgrade
>     > (just to narrow the scope).
>     >
>     > It is possible that you are upgrading from a version that did
>     not have
>     > systemd support to one that does. If so, the upgrade may require
>     restarting
>     > the tasks (either by themselves, or just starting a fresh
>     agent). Please
>     > check out some of the work in MESOS-3007 to get a better
>     understanding of
>     > what the issue I am referring to is.
>     >
>     > If you can verify that you are making one of these transitions
>     from a bad
>     > world to a good world, then you can devise a plan for your upgrade.
>     >
>     > Joris
>     >
>     > —
>     > *Joris Van Remoortere*
>     > Mesosphere
>     >
>     > On Fri, Jun 17, 2016 at 8:28 AM, Qiang Chen <qzschen@gmail.com
>     <mailto:qzschen@gmail.com>> wrote:
>     >
>     > > Hi all,
>     > >
>     > > I met an issue when upgrading mesos-slave to 0.28.2.
>     > >
>     > > At the process of recovering mesos-slave / framework container
>     stage, it
>     > > produced the following errors.
>     > >
>     > >
>     > > ```
>     > > Log file created at: 2016/06/15 15:01:43
>     > > Running on machine: mesos-slave-online005-xxx.cloud.xxx.domain
>     > > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid
>     file:line] msg
>     > > W0615 15:01:43.285518  4182 linux_launcher.cpp:197] Couldn't
>     find pid
>     > > '42322' in 'mesos_executors.slice'. This can lead to lack of
>     proper
>     > > resource isolation
>     > > W0615 15:01:43.286182  4182 linux_launcher.cpp:197] Couldn't
>     find pid
>     > > '42312' in 'mesos_executors.slice'. This can lead to lack of
>     proper
>     > > resource isolation
>     > > W0615 15:01:43.286669  4182 linux_launcher.cpp:197] Couldn't
>     find pid
>     > > '42309' in 'mesos_executors.slice'. This can lead to lack of
>     proper
>     > > resource isolation
>     > > W0615 15:01:43.287144  4182 linux_launcher.cpp:197] Couldn't
>     find pid
>     > > '42304' in 'mesos_executors.slice'. This can lead to lack of
>     proper
>     > > resource isolation
>     > > W0615 15:01:43.287636  4182 linux_launcher.cpp:197] Couldn't
>     find pid
>     > > '42300' in 'mesos_executors.slice'. This can lead to lack of
>     proper
>     > > resource isolation
>     > > W0615 15:01:43.288120  4182 linux_launcher.cpp:197] Couldn't
>     find pid
>     > > '42317' in 'mesos_executors.slice'. This can lead to lack of
>     proper
>     > > resource isolation
>     > > E0615 15:01:43.471676  4201 process.cpp:1958] Failed to
>     shutdown socket
>     > > with fd 24: Transport endpoint is not connected
>     > > E0615 15:01:43.476007  4201 process.cpp:1958] Failed to
>     shutdown socket
>     > > with fd 24: Transport endpoint is not connected
>     > > E0615 15:01:43.476143  4201 process.cpp:1958] Failed to
>     shutdown socket
>     > > with fd 24: Transport endpoint is not connected
>     > > E0615 15:01:43.476272  4201 process.cpp:1958] Failed to
>     shutdown socket
>     > > with fd 24: Transport endpoint is not connected
>     > > E0615 15:01:43.476483  4201 process.cpp:1958] Failed to
>     shutdown socket
>     > > with fd 24: Transport endpoint is not connected
>     > > E0615 15:01:43.476618  4201 process.cpp:1958] Failed to
>     shutdown socket
>     > > with fd 24: Transport endpoint is not connected
>     > >
>     > > ```
>     > >
>     > > And it will also cause the OOM errors, such as:
>     > >
>     > > ```
>     > > I0615 15:01:43.324935  4172 mem.cpp:602] Started listening for
>     OOM events
>     > > for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
>     > > I0615 15:01:43.325469 4172 mem.cpp:722] Started listening on
>     low memory
>     > > pressure events for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
>     > > I0615 15:01:43.326004  4172 mem.cpp:722] Started listening on
>     medium
>     > > memory pressure events for container
>     f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
>     > > I0615 15:01:43.326539  4172 mem.cpp:722] Started listening on
>     critical
>     > > memory pressure events for container
>     f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
>     > >
>     > > ```
>     > >
>     > > Did someone suffer this? thanks.
>     > >
>     > > --
>     > > Best Regards,
>     > > Chen, Qiang
>     > >
>     > >
>     >
>
>
>
>     --
>     Best Regards,
>     Haosdent Huang
>
>

-- 
Best Regards,
Chen, Qiang


Mime
View raw message