mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Mann (Jira)" <>
Subject [jira] [Commented] (MESOS-10068) Mesos Master doesn't send AGENT_REMOVED when removing agent from internal state
Date Tue, 28 Jan 2020 19:26:00 GMT


Greg Mann commented on MESOS-10068:

[~daltonmatos] regarding this ticket, yea I think it makes sense to close this one and mention
it in MESOS-10089.

Time is tight over here, but I'd be happy to mentor you a bit in the codebase :) Would you
like to start by addressing MESOS-10089? If so, we could do an intro call to get started.
Feel free to find me on Mesos slack if you're on there.

> Mesos Master doesn't send AGENT_REMOVED when removing agent from internal state
> -------------------------------------------------------------------------------
>                 Key: MESOS-10068
>                 URL:
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.7.3, 1.8.2, 1.9.1
>            Reporter: Dalton Matos Coelho Barreto
>            Priority: Major
>         Attachments: master-full-logs.log
> Hello,
> Looking at the documentation of the master {{/api/v1}} endpoint, the {{SUBSCRIBE}} message
says that only {{TASK_ADDED}} and {{TASK_UPDATED}} is supported for this endpoint, but when
a new agent joins the cluster a {{AGENT_ADDED}} event is received.
> The problem is that when this agent is stopped the {{AGENT_REMOVED}} is not received
by clients subscribed to the master API.
> I testes this behavior with versions: {{1.7.3}}, {{1.8.2}} and {{1.9.1}}. All using the
docker image {{mesos/mesos-centos}}.
> The only way I saw a {{AGENT_REMOVED}} event was when a new agent joined the cluster
but the master couldn't communicate with this agent, in this specific test there was a firewall
blocking port {{5051}} on the slave, that is, no body was being able to tal to the slave on
port {{5051}}.
> h2. Here are the steps do reproduce the problem
>  * Start a new mesos master
>  * Connect to the {{/api/v1}} endpoint, sendingo a {{SUBSCRIBE}} message:
>  ** 
> {noformat}
> curl --no-buffer -Ld '{"type": "SUBSCRIBE"}' -H "Content-Type: application/json" http://MASTER_IP:5050/api/v1{noformat}
>  * Start a new slave and confirm the {{AGENT_ADDED}} event is delivered;
>  * Stop this slave;
>  * Checks that {{/slaves?slave_id=AGENT_ID}} returns a JSON response with the field {{active=false}}.
>  * Waits for mesos master stop listing this slave, that is, {{/slaves?slave_id=AGENT_ID}}
returns an empty response;
> Even after the empty response, the event never reaches the subscriber.
> The mesos master logs shows this:
> {noformat}
>  I1213 15:03:10.338935    13 master.cpp:1297] Agent 2cd23025-c09d-401b-8f26-9265eda8f800-S1
at slave(1)@ (86813ca2a964) disconnected
> I1213 15:03:10.339089    13 master.cpp:3399] Disconnecting agent 2cd23025-c09d-401b-8f26-9265eda8f800-S1
at slave(1)@ (86813ca2a964)
> I1213 15:03:10.339207    13 master.cpp:3418] Deactivating agent 2cd23025-c09d-401b-8f26-9265eda8f800-S1
at slave(1)@ (86813ca2a964)
> {noformat}
> And then:
> {noformat}
> W1213 15:04:40.726670    15 process.cpp:1917] Failed to send 'mesos.internal.PingSlaveMessage'
to '', connect: Failed to connect to No route to host{noformat}
> And some time after this:
> {noformat}
> I1213 15:04:37.685007     7 hierarchical.cpp:900] Removed agent 2cd23025-c09d-401b-8f26-9265eda8f800-S1
> Even after this removal, the {{AGENT_REMOVED}} event is not delivered.
> I will attach the full master logs also.
> Do you think this could be a bug?

This message was sent by Atlassian Jira

View raw message