Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DEE7F200B0F for ; Fri, 17 Jun 2016 15:32:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id DD7CB160A61; Fri, 17 Jun 2016 13:32:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DC0B0160A4C for ; Fri, 17 Jun 2016 15:32:00 +0200 (CEST) Received: (qmail 17460 invoked by uid 500); 17 Jun 2016 13:31:59 -0000 Mailing-List: contact user-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mesos.apache.org Delivered-To: mailing list user@mesos.apache.org Received: (qmail 17450 invoked by uid 99); 17 Jun 2016 13:31:59 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2016 13:31:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 723EAC06AB for ; Fri, 17 Jun 2016 13:31:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=mesosphere.io Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 3O4anzc6mpqS for ; Fri, 17 Jun 2016 13:31:57 +0000 (UTC) Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 0DF705F5CC for ; Fri, 17 Jun 2016 13:31:57 +0000 (UTC) Received: by mail-wm0-f70.google.com with SMTP id k184so42657863wme.3 for ; Fri, 17 Jun 2016 06:31:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mesosphere.io; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=CaNq+ST1lhjqORWJvUSHucR8jg64vcudmRpcPlzCmBo=; b=KkyoCjMqpMaRb7OtI7qtn6b7yhzFZnBd66aSYX8YfExkPVGxN4gys8FbyAvIDfuPtQ kkVCxasAG0zQTztJgt9yThSfVLxb2KmB5HPKbISYifS6Sg3P9A6FjpwaK838ouh5iEzq ZCGrOX8VmYT1H4T+dZy0u2tRk4n96UR419MMs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=CaNq+ST1lhjqORWJvUSHucR8jg64vcudmRpcPlzCmBo=; b=d0bs23fba4FYsR67inzKlfOmNEPN4u5iPouVdtmiDNOl67Jju0UY0J86YUES8h1AR7 D6tW/4x38+jSxrBOBRM61cfFpj3uoOVTaKv8eKQVe3RqZKBIMXU3g69op13Etygl4Cpb dopT7tlg5zmXHEX0+rdMVeuXfKXaT7BOtzmMeqr3iseAX5FUKEwIF5Svl/XpkcgcsR+K 0wAt9o+NoWWI1JhV/gMQk/e7o8dZelSjIm5pAzUyV1wtqCPBuF5xX7yYNQvINcXPdepQ JjDE9RU3/1cIj3i0k1ztAPy/P8MuYl4fuyVB5qirFtHpw1xv143CIYx/R1c1RTkg4bTG NDgg== X-Gm-Message-State: ALyK8tK6YSKodoDIy4fe9CPobOZenMIq+P0SohT9+pioFQjU7WUdqG+B9FHlROesW+gRc90iZSiul06dilfX6poH MIME-Version: 1.0 X-Received: by 10.194.203.37 with SMTP id kn5mr2430031wjc.42.1466170310087; Fri, 17 Jun 2016 06:31:50 -0700 (PDT) Received: by 10.194.26.161 with HTTP; Fri, 17 Jun 2016 06:31:49 -0700 (PDT) In-Reply-To: References: <57639874.5060908@gmail.com> Date: Fri, 17 Jun 2016 15:31:49 +0200 Message-ID: Subject: Re: Failed to shutdown socket with fd xxx From: Joris Van Remoortere To: "dev@mesos.apache.org" Cc: user Content-Type: multipart/alternative; boundary=047d7b6dcdb88374500535796090 archived-at: Fri, 17 Jun 2016 13:32:02 -0000 --047d7b6dcdb88374500535796090 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable The shutdown errors are not the issue. The concerning part is this warning: > W0615 15:01:43.285518 4182 linux_launcher.cpp:197] Couldn't find pid > '42322' in 'mesos_executors.slice'. This can lead to lack of proper > resource isolation That indicates a transition from the old systemd lack of support to the new support. =E2=80=94 *Joris Van Remoortere* Mesosphere On Fri, Jun 17, 2016 at 2:35 PM, haosdent wrote: > Hi, @Qiang. > > @Joseph have a nice explain about at Shutdown failed on fd > > http://search-hadoop.com/m/0Vlr6pe7qb2MJX8B1&subj=3DRe+Benign+Shutdown+fa= iled+on+fd+error+messages > Those errors could be ignored. > > For > ``` > I0615 15:01:43.324935 4172 mem.cpp:602] Started listening for OOM events > for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc > ``` > > These are normal info log, it happen when Mesos CgroupMemIsolator registe= r > oom hooks for your containers. > > On Fri, Jun 17, 2016 at 8:22 PM, Joris Van Remoortere > > wrote: > > > Can you provide: > > 1. The version that you are upgrading from. > > 2. Whether you made any OS / init system changes alongside this upgrade > > (just to narrow the scope). > > > > It is possible that you are upgrading from a version that did not have > > systemd support to one that does. If so, the upgrade may require > restarting > > the tasks (either by themselves, or just starting a fresh agent). Pleas= e > > check out some of the work in MESOS-3007 to get a better understanding = of > > what the issue I am referring to is. > > > > If you can verify that you are making one of these transitions from a b= ad > > world to a good world, then you can devise a plan for your upgrade. > > > > Joris > > > > =E2=80=94 > > *Joris Van Remoortere* > > Mesosphere > > > > On Fri, Jun 17, 2016 at 8:28 AM, Qiang Chen wrote: > > > > > Hi all, > > > > > > I met an issue when upgrading mesos-slave to 0.28.2. > > > > > > At the process of recovering mesos-slave / framework container stage, > it > > > produced the following errors. > > > > > > > > > ``` > > > Log file created at: 2016/06/15 15:01:43 > > > Running on machine: mesos-slave-online005-xxx.cloud.xxx.domain > > > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg > > > W0615 15:01:43.285518 4182 linux_launcher.cpp:197] Couldn't find pid > > > '42322' in 'mesos_executors.slice'. This can lead to lack of proper > > > resource isolation > > > W0615 15:01:43.286182 4182 linux_launcher.cpp:197] Couldn't find pid > > > '42312' in 'mesos_executors.slice'. This can lead to lack of proper > > > resource isolation > > > W0615 15:01:43.286669 4182 linux_launcher.cpp:197] Couldn't find pid > > > '42309' in 'mesos_executors.slice'. This can lead to lack of proper > > > resource isolation > > > W0615 15:01:43.287144 4182 linux_launcher.cpp:197] Couldn't find pid > > > '42304' in 'mesos_executors.slice'. This can lead to lack of proper > > > resource isolation > > > W0615 15:01:43.287636 4182 linux_launcher.cpp:197] Couldn't find pid > > > '42300' in 'mesos_executors.slice'. This can lead to lack of proper > > > resource isolation > > > W0615 15:01:43.288120 4182 linux_launcher.cpp:197] Couldn't find pid > > > '42317' in 'mesos_executors.slice'. This can lead to lack of proper > > > resource isolation > > > E0615 15:01:43.471676 4201 process.cpp:1958] Failed to shutdown sock= et > > > with fd 24: Transport endpoint is not connected > > > E0615 15:01:43.476007 4201 process.cpp:1958] Failed to shutdown sock= et > > > with fd 24: Transport endpoint is not connected > > > E0615 15:01:43.476143 4201 process.cpp:1958] Failed to shutdown sock= et > > > with fd 24: Transport endpoint is not connected > > > E0615 15:01:43.476272 4201 process.cpp:1958] Failed to shutdown sock= et > > > with fd 24: Transport endpoint is not connected > > > E0615 15:01:43.476483 4201 process.cpp:1958] Failed to shutdown sock= et > > > with fd 24: Transport endpoint is not connected > > > E0615 15:01:43.476618 4201 process.cpp:1958] Failed to shutdown sock= et > > > with fd 24: Transport endpoint is not connected > > > > > > ``` > > > > > > And it will also cause the OOM errors, such as: > > > > > > ``` > > > I0615 15:01:43.324935 4172 mem.cpp:602] Started listening for OOM > events > > > for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc > > > I0615 15:01:43.325469 4172 mem.cpp:722] Started listening on low memo= ry > > > pressure events for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc > > > I0615 15:01:43.326004 4172 mem.cpp:722] Started listening on medium > > > memory pressure events for container > f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc > > > I0615 15:01:43.326539 4172 mem.cpp:722] Started listening on critica= l > > > memory pressure events for container > f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc > > > > > > ``` > > > > > > Did someone suffer this? thanks. > > > > > > -- > > > Best Regards, > > > Chen, Qiang > > > > > > > > > > > > -- > Best Regards, > Haosdent Huang > --047d7b6dcdb88374500535796090 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
The shutdown errors are not the issue.
The concerning = part is this warning:
W0615=C2=A015:01:43.285518=C2=A0 4182 linux_launcher.cpp:197] Couldn't find pi= d '42322' in 'mesos_executors.slice'. This can lead to lack= of proper resource isolation
That indicates a tran= sition from the old systemd lack of support to the new support.=C2=A0
=

=E2=80=94=C2=A0
Joris Van Remoortere
Mesosphere=

On Fri, Jun 17, 2016 at 2:35 PM, haosdent <haosdent@gmail.com> wrote:
Hi, @Qiang.

@Joseph have a nice explain about at Shutdown failed on fd
http://search-hadoop.com/m/0Vlr6pe7qb2MJX8B1&subj=3DRe+Benign+Shutdow= n+failed+on+fd+error+messages
Those errors could be ignored.

For
```
I0615 15:01:43.324935=C2=A0 4172 mem.cpp:602] Started listening for OOM eve= nts
for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
```

These are normal info log, it happen when Mesos CgroupMemIsolator register<= br> oom hooks for your containers.

On Fri, Jun 17, 2016 at 8:22 PM, Joris Van Remoortere <joris@mesosphere.io>
wrote:

> Can you provide:
> 1. The version that you are upgrading from.
> 2. Whether you made any OS / init system changes alongside this upgrad= e
> (just to narrow the scope).
>
> It is possible that you are upgrading from a version that did not have=
> systemd support to one that does. If so, the upgrade may require resta= rting
> the tasks (either by themselves, or just starting a fresh agent). Plea= se
> check out some of the work in MESOS-3007 to get a better understanding= of
> what the issue I am referring to is.
>
> If you can verify that you are making one of these transitions from a = bad
> world to a good world, then you can devise a plan for your upgrade. >
> Joris
>
> =E2=80=94
> *Joris Van Remoortere*
> Mesosphere
>
> On Fri, Jun 17, 2016 at 8:28 AM, Qiang Chen <qzschen@gmail.com> wrote:
>
> > Hi all,
> >
> > I met an issue when upgrading mesos-slave to 0.28.2.
> >
> > At the process of recovering mesos-slave / framework container st= age, it
> > produced the following errors.
> >
> >
> > ```
> > Log file created at: 2016/06/15 15:01:43
> > Running on machine: mesos-slave-online005-xxx.cloud.xxx.domain > > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] m= sg
> > W0615 15:01:43.285518=C2=A0 4182 linux_launcher.cpp:197] Couldn&#= 39;t find pid
> > '42322' in 'mesos_executors.slice'. This can lead= to lack of proper
> > resource isolation
> > W0615 15:01:43.286182=C2=A0 4182 linux_launcher.cpp:197] Couldn&#= 39;t find pid
> > '42312' in 'mesos_executors.slice'. This can lead= to lack of proper
> > resource isolation
> > W0615 15:01:43.286669=C2=A0 4182 linux_launcher.cpp:197] Couldn&#= 39;t find pid
> > '42309' in 'mesos_executors.slice'. This can lead= to lack of proper
> > resource isolation
> > W0615 15:01:43.287144=C2=A0 4182 linux_launcher.cpp:197] Couldn&#= 39;t find pid
> > '42304' in 'mesos_executors.slice'. This can lead= to lack of proper
> > resource isolation
> > W0615 15:01:43.287636=C2=A0 4182 linux_launcher.cpp:197] Couldn&#= 39;t find pid
> > '42300' in 'mesos_executors.slice'. This can lead= to lack of proper
> > resource isolation
> > W0615 15:01:43.288120=C2=A0 4182 linux_launcher.cpp:197] Couldn&#= 39;t find pid
> > '42317' in 'mesos_executors.slice'. This can lead= to lack of proper
> > resource isolation
> > E0615 15:01:43.471676=C2=A0 4201 process.cpp:1958] Failed to shut= down socket
> > with fd 24: Transport endpoint is not connected
> > E0615 15:01:43.476007=C2=A0 4201 process.cpp:1958] Failed to shut= down socket
> > with fd 24: Transport endpoint is not connected
> > E0615 15:01:43.476143=C2=A0 4201 process.cpp:1958] Failed to shut= down socket
> > with fd 24: Transport endpoint is not connected
> > E0615 15:01:43.476272=C2=A0 4201 process.cpp:1958] Failed to shut= down socket
> > with fd 24: Transport endpoint is not connected
> > E0615 15:01:43.476483=C2=A0 4201 process.cpp:1958] Failed to shut= down socket
> > with fd 24: Transport endpoint is not connected
> > E0615 15:01:43.476618=C2=A0 4201 process.cpp:1958] Failed to shut= down socket
> > with fd 24: Transport endpoint is not connected
> >
> > ```
> >
> > And it will also cause the OOM errors, such as:
> >
> > ```
> > I0615 15:01:43.324935=C2=A0 4172 mem.cpp:602] Started listening f= or OOM events
> > for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
> > I0615 15:01:43.325469 4172 mem.cpp:722] Started listening on low = memory
> > pressure events for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168d= c
> > I0615 15:01:43.326004=C2=A0 4172 mem.cpp:722] Started listening o= n medium
> > memory pressure events for container f50b4c7a-d1d2-4fc8-abb9-5ab5= 49f168dc
> > I0615 15:01:43.326539=C2=A0 4172 mem.cpp:722] Started listening o= n critical
> > memory pressure events for container f50b4c7a-d1d2-4fc8-abb9-5ab5= 49f168dc
> >
> > ```
> >
> > Did someone suffer this? thanks.
> >
> > --
> > Best Regards,
> > Chen, Qiang
> >
> >
>



--
Best Regards,
Haosdent Huang

--047d7b6dcdb88374500535796090--