mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (MESOS-875) A recovering slave should not ignore valid status updates.
Date Wed, 11 Dec 2013 01:50:07 GMT

     [ https://issues.apache.org/jira/browse/MESOS-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benjamin Mahler closed MESOS-875.
---------------------------------

    Resolution: Not A Problem

I filed this a bit pre-maturely:

Here, the executor sent an update while the slave was down. At this point the executor driver
caches the update, in order to flush it once the slave reconnects to the driver.

However, in this case, executor exited before the executor driver was able to process the
reconnect and re-send the update.

For correctness, executors need to know that they cannot send an update and exit while the
slave is down, or this will result in their tasks being lost.

> A recovering slave should not ignore valid status updates.
> ----------------------------------------------------------
>
>                 Key: MESOS-875
>                 URL: https://issues.apache.org/jira/browse/MESOS-875
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.16.0
>            Reporter: Benjamin Mahler
>            Assignee: Vinod Kone
>            Priority: Critical
>             Fix For: 0.17.0
>
>
> This is a regression due to the bug fix for MESOS-732: https://reviews.apache.org/r/14616/
> Now that slave recovery is asynchronous, status updates coming from the executors will
be ignored since the slave does not know about the framework until recovery is completed.
> Example:
> I1210 20:06:51.633050 54429 slave.cpp:1756] Handling status update TASK_FINISHED (UUID:
foo) for task T of framework F from executor(1)@IP:PORT
> W1210 20:06:51.633128 54429 slave.cpp:1766] Ignoring status update TASK_FINISHED (UUID:
foo) for task T of framework F for unknown framework F



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message