mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Bannier (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-8430) Race between operation status updates and agent update
Date Wed, 10 Jan 2018 21:43:00 GMT

     [ https://issues.apache.org/jira/browse/MESOS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benjamin Bannier updated MESOS-8430:
------------------------------------
    Issue Type: Bug  (was: Task)

> Race between operation status updates and agent update
> ------------------------------------------------------
>
>                 Key: MESOS-8430
>                 URL: https://issues.apache.org/jira/browse/MESOS-8430
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.5.0
>            Reporter: Benjamin Bannier
>
> Currently, there exists a possible race between operation status updates triggered by
a status update manager in the agent and updates to the agent's resources.
> Consider a master failover where an agent has a resource provider with an operation which
was not terminal. Now let the operation succeed and become terminal in the agent, but have
the master failover before it processes the update. After master failover, the new master
would learn about the resource provider resources via an {{UpdateSlaveMessage}}. Simultaneously,
a status update manager in the agent could inform the master about the unacknowledged, successful
operation. If the operation status update arrives in the master before the {{UpdateSlaveMessage}},
the operation status update handler could attempt to apply the operation on resources unknown
to it, yet. This would likely trigger a {{CHECK}} failure in a contains check in the master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message