mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Wu <jos...@mesosphere.io>
Subject Re: What will happen in maintenance mode
Date Mon, 25 Jul 2016 18:14:35 GMT
There are some cluster environments where nodes do not have an IP or
hostname.  That's why each MachineID must one have OR the other.  Not one
XOR the other.

There is a note further up the page that explains how Mesos matches
machines to agents:
https://github.com/apache/mesos/blame/3e115accca390663575753279f4400495625cb91/docs/maintenance.md#L135-L142

On Fri, Jul 22, 2016 at 9:34 PM, tommy xiao <xiaods@gmail.com> wrote:

> yes, in recently mesos deployment, if i ignore the hostname, just
> specified IP, the mesos cluster sometime is not working. because the
> hostname is not correct. so i also curious the machine definition:
> "Each machine must have at least a hostname or IP included. The hostname
> is not case-sensitive."
>
> it should be defined must hostname and ip included.
>
>
> 2016-07-19 11:38 GMT+08:00 Qiang Chen <qzschen@gmail.com>:
>
>> Thanks Joseph.
>>
>> I saw this from mesos [doc site](
>> http://mesos.apache.org/documentation/latest/maintenance/):
>>
>> "Each machine must have at least a hostname or IP included. The hostname
>> is not case-sensitive."
>>
>> From my test, the statement above is not correct, as if I only specific
>> the hostname or IP, it will NOT take effect for the maintenance agents.
>> but should specific both will OK.
>>
>> On 2016年07月19日 02:17, Joseph Wu wrote:
>>
>> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
>> for Automatic Cleanup! (joseph@mesosphere.io) Add cleanup rule
>> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3Dm%252B%252F9y8szBbdXKWiZ%252FDADQ0%252Fzx2OsVPpMz1%252BhAd8WOjE%253D%26token%3D7yPWMILH6f2hh7W8GLG1B4W3dWqI9yjvahQVEYFryQn3PGah0U1DPo7rfMlTIncRBOxGwo9jI4CHtQ%252BZ435zSbIfdjC1em9cdavejMkUAGEDLcp7EpoDgqU0pX3rrX3o0uawWqnSxys%253D&tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>> | More info
>> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=26129651012&tc_rand=629032590&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>>
>>
>> My guess is that your agents don't match the machines you specified.
>> Note: The maintenance endpoints in Mesos allow you to specify maintenance
>> against non-existent machines, because the operator may add agents on those
>> machines in future.
>>
>> In Mesos' maintenance primitives, a "machine" is a hostname + IP.  (A
>> physical/virtual machine can hold multiple agents.)  The response in
>> /maintenance/status is in terms of machines, not agents.  If none of your
>> frameworks support inverse offers, then you won't get any useful
>> information from the /maintenance/status endpoint.
>>
>> You can figure out an agent's hostname/IP by hitting the /master/slaves
>> endpoint:
>>
>> {
>>   "slaves": [
>>     {
>>       "pid":"slave(1)@127.0.0.1:5051",
>>       "hostname":"foo-bar",
>>       ...
>>
>> ^ The above translates to a machine = { "hostname": "foo-bar", "ip" : "
>> 127.0.0.1" }
>>
>> On Mon, Jul 18, 2016 at 2:08 AM, Qiang Chen <qzschen@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm puzzled in using maintenance mode.
>>>
>>> I see this from mesos [doc site](
>>> http://mesos.apache.org/documentation/latest/maintenance/):
>>>
>>> ```
>>> When maintenance is triggered by the operator, all agents on the machine
>>> are told to shutdown. These agents are removed from the master, which means
>>> that a TASK_LOST status update will be sent for every task running on
>>> each of those agents. The scheduler driver’s slaveLost callback will
>>> also be invoked for each of the removed agents. Any agents on machines in
>>> maintenance are also prevented from re-registering with the master in the
>>> future (until maintenance is completed and the machine is brought back up).
>>> ```
>>> But I didn't find the agent machine shutdown or task failed when I test
>>> the maintenance HTTP endpoints.
>>>
>>> If mesos agents are in that mode will move the running tasks to other
>>> agents? namely, it will evacuate all the tasks in those agents? and the
>>> shutdown?
>>>
>>> When I POST "/maintenance/schedule" and "/machine/down" and give a
>>> proper maintain time window. I got the response that those specified agents
>>> are in the "draining_machines" and "down_machines" list by GET
>>> "/maintenance/status", but didn't shutdown and evacuate any tasks, why ?
>>> does it make sense?
>>>
>>> Thanks.
>>>
>>> --
>>> Best Regards,
>>> Chen, Qiang
>>>
>>>
>>
>> --
>> Best Regards,
>> Chen, Qiang
>>
>>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>

Mime
View raw message