mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <>
Subject [jira] [Created] (MESOS-714) Slave should check if the (re-)registered is from the expected master
Date Tue, 01 Oct 2013 04:08:23 GMT
Vinod Kone created MESOS-714:

             Summary: Slave should check if the (re-)registered is from the expected master
                 Key: MESOS-714
             Project: Mesos
          Issue Type: Bug
            Reporter: Vinod Kone
            Assignee: Vinod Kone
             Fix For: 0.15.0

The following sequence of events happened in production at Twitter.

--> Slave registered with master A
--> A sent an ACK for registration but died immediately (user restart)
--> Slave detected a new master B and sent a re-register request
--> Slave received the ACK from A now.
--> The bug here is that the slave accepted this ACK even though it was not from master
--> Master B ignored the re-register request because it didn't know it was the master yet!
--> Slave never re-tried its registration because it thinks its registered with B.

At this point slave thinks it is registered but the master (B) has no idea of it!

Fix: Slaves should check that (re-)registered messages are from the expected master pid.

This message was sent by Atlassian JIRA

View raw message