Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8BC4E200B48 for ; Mon, 18 Jul 2016 20:18:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8A572160A65; Mon, 18 Jul 2016 18:18:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AE349160A5D for ; Mon, 18 Jul 2016 20:18:05 +0200 (CEST) Received: (qmail 34405 invoked by uid 500); 18 Jul 2016 18:18:03 -0000 Mailing-List: contact user-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mesos.apache.org Delivered-To: mailing list user@mesos.apache.org Received: (qmail 34395 invoked by uid 99); 18 Jul 2016 18:18:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2016 18:18:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 253ABC0283 for ; Mon, 18 Jul 2016 18:18:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.181 X-Spam-Level: *** X-Spam-Status: No, score=3.181 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_BADIPHTTP=2, NORMAL_HTTP_TO_IP=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=mesosphere.io Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id qSZztbvVn3n1 for ; Mon, 18 Jul 2016 18:17:59 +0000 (UTC) Received: from mail-lf0-f72.google.com (mail-lf0-f72.google.com [209.85.215.72]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 9E84D5FAF0 for ; Mon, 18 Jul 2016 18:17:58 +0000 (UTC) Received: by mail-lf0-f72.google.com with SMTP id 33so120240161lfw.1 for ; Mon, 18 Jul 2016 11:17:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mesosphere.io; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=aCRk8ZmEnLnP2mBP6/qKIoQiydQh4EN0ZLdba7Gqc24=; b=TsZKQ4nWUzhD+i+Qz7XH7DWbFybGULouPqsqNJCS7IyXQj3/Qf5qGkKOYdbmv1JgJz wcHuBrk78XWAF8sIGuGX/72L3bBI9ztxeIyLiEbyy+VTWcy0to9GHwwLpX/JNLbidoWj 9jB3+aIxi4Ad4vnidm72pUEN3m5VrNha9XVSw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=aCRk8ZmEnLnP2mBP6/qKIoQiydQh4EN0ZLdba7Gqc24=; b=fdprcRsH7P229VEy3t5hn5qShgvo6Nk0/wVbWRBXzt3WCaaf2t+VJFmmdvGyNCLjHC WfhzO/uu2N10JVot26V3JFJIOD+zT3JwyQ0oKif11qdcFOOvvromTlkEenoNsJ+042O/ 49SL/tY+lXxKTSL3bq8fyzuSAOFvBfxWDD8Hfk9Moga+zpSimmhRoELa/8NEBeB4TWhy i5p0E5+zf4qVDZ4SPmPokfTrLkM7972Wqvq7BSS157UVZcTICUu4TNgIXTLHPMjHQTYg Kz4J9HS3QN19OX978/oH9vRQo/lbmJdPd+eU71cr00KF8xdr3GuWSG0jj1EBXDBhUwAw o2rQ== X-Gm-Message-State: ALyK8tL6/c7NugJvJn3l2NEYFL2Sdj/u+xs4VGylHfojK88ZDshc41lngBg6MVCERzpL+d3A9mCD4mge8lrOaOvA X-Received: by 10.194.85.13 with SMTP id d13mr2501632wjz.125.1468865872431; Mon, 18 Jul 2016 11:17:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.103.164 with HTTP; Mon, 18 Jul 2016 11:17:51 -0700 (PDT) In-Reply-To: <578C9C87.6040009@gmail.com> References: <578C9C87.6040009@gmail.com> From: Joseph Wu Date: Mon, 18 Jul 2016 11:17:51 -0700 Message-ID: Subject: Re: What will happen in maintenance mode To: user Content-Type: multipart/alternative; boundary=047d7bfcff728ce3810537ecfcba archived-at: Mon, 18 Jul 2016 18:18:06 -0000 --047d7bfcff728ce3810537ecfcba Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable My guess is that your agents don't match the machines you specified. Note: The maintenance endpoints in Mesos allow you to specify maintenance against non-existent machines, because the operator may add agents on those machines in future. In Mesos' maintenance primitives, a "machine" is a hostname + IP. (A physical/virtual machine can hold multiple agents.) The response in /maintenance/status is in terms of machines, not agents. If none of your frameworks support inverse offers, then you won't get any useful information from the /maintenance/status endpoint. You can figure out an agent's hostname/IP by hitting the /master/slaves endpoint: { "slaves": [ { "pid":"slave(1)@127.0.0.1:5051", "hostname":"foo-bar", ... ^ The above translates to a machine =3D { "hostname": "foo-bar", "ip" : " 127.0.0.1" } On Mon, Jul 18, 2016 at 2:08 AM, Qiang Chen wrote: > Hi all, > > I'm puzzled in using maintenance mode. > > I see this from mesos [doc site]( > http://mesos.apache.org/documentation/latest/maintenance/): > > ``` > When maintenance is triggered by the operator, all agents on the machine > are told to shutdown. These agents are removed from the master, which mea= ns > that a TASK_LOST status update will be sent for every task running on > each of those agents. The scheduler driver=E2=80=99s slaveLost callback w= ill also > be invoked for each of the removed agents. Any agents on machines in > maintenance are also prevented from re-registering with the master in the > future (until maintenance is completed and the machine is brought back up= ). > ``` > But I didn't find the agent machine shutdown or task failed when I test > the maintenance HTTP endpoints. > > If mesos agents are in that mode will move the running tasks to other > agents? namely, it will evacuate all the tasks in those agents? and the > shutdown? > > When I POST "/maintenance/schedule" and "/machine/down" and give a proper > maintain time window. I got the response that those specified agents are = in > the "draining_machines" and "down_machines" list by GET > "/maintenance/status", but didn't shutdown and evacuate any tasks, why ? > does it make sense? > > Thanks. > > -- > Best Regards, > Chen, Qiang > > --047d7bfcff728ce3810537ecfcba Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
My guess is that your agents don't match the machines = you specified.=C2=A0 Note: The maintenance endpoints in Mesos allow you to = specify maintenance against non-existent machines, because the operator may= add agents on those machines in future.

In Mesos' maintena= nce primitives, a "machine" is a hostname + IP.=C2=A0 (A physical= /virtual machine can hold multiple agents.)=C2=A0 The response in /maintena= nce/status is in terms of machines, not agents.=C2=A0 If none of your frame= works support inverse offers, then you won't get any useful information= from the /maintenance/status endpoint.

You can figure out an = agent's hostname/IP by hitting the /master/slaves endpoint:
{
"slaves": [
{
"pid":"slav= e(1)@127.0.0.1:5051",
&= quot;hostname":"foo-bar",
...
^ The above = translates to a machine =3D { "hostname": "foo-bar", &q= uot;ip" : " 127.0.0.1" }

On Mon, Jul 18, 2016 at 2:08 AM, Qian= g Chen <qzschen@gmail.com> wrote:
=20 =20 =20
Hi all,

I'm puzzled in using maintenance mode.

I see this from mesos [doc site](http://mesos.apache.org/documentation/latest/mainten= ance/):

```
When maintenance is triggered by the operator, all agents on the machine are told to shutdown. These agents are removed from the master, which means that a TASK_LOST status update will be sent for every task running on each of those agents. The scheduler driver=E2=80=99s slaveLost callback = will also be invoked for each of the removed agents. Any agents on machines in maintenance are also prevented from re-registering with the master in the future (until maintenance is completed and the machine is brought back up).
```
But I didn't find the agent machine shutdown or task failed when I test the maintenance HTTP endpoints.

If mesos agents are in that mode will move the running tasks to other agents? namely, it will evacuate all the tasks in those agents? and the shutdown?

When I POST "/maintenance/schedule" and "/machine/down&q= uot; and give a proper maintain time window. I got the response that those specified agents are in the "draining_machines" and "down_machines= " list by GET "/maintenance/status", but didn't shutdown and evacua= te any tasks, why ? does it make sense?

Thanks.

--=20
Best Regards,
Chen, Qiang

--047d7bfcff728ce3810537ecfcba--