incubator-mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <>
Subject [jira] [Assigned] (MESOS-305) Inform the framework about a master failover
Date Wed, 20 Mar 2013 00:43:15 GMT


Vinod Kone reassigned MESOS-305:

    Assignee: Benjamin Hindman

This seems to be causing a slew of LOST tasks @Twitter, whenever a master failsover.

[~benjaminhindman] would you have some time to take a look at this and see if we there is
a short-term fix for this. IIUC, we were waiting on leader detector refactor before fixing
> Inform the framework about a master failover
> --------------------------------------------
>                 Key: MESOS-305
>                 URL:
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Vinod Kone
>            Assignee: Benjamin Hindman
> With the recent changes in the master detecter code, we no longer send 'NoMasterDetected'
to the scheduler driver, which in turn means the 'disconnected' scheduler callback is never
> At Twitter this manifested as a spew of LOST tasks whenever a master failover happens.
This is because the scheduler holds on to offers for a while and never knows about the invalidity
of offers, until after tasks are launched. Though this is a race, it is ideal to minimize
this window as much as possible by informing the scheduler of the master failover.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message