mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone" <vinodk...@gmail.com>
Subject Re: Review Request 26699: Updated slave re-registration to send unacknowledged task states.
Date Thu, 16 Oct 2014 04:12:23 GMT


> On Oct. 15, 2014, 4:03 a.m., Adam B wrote:
> > src/slave/slave.cpp, lines 938-948
> > <https://reviews.apache.org/r/26699/diff/1/?file=720970#file720970line938>
> >
> >     Couldn't the Slave and the SUM get out of sync here? Right now, the SUM will
flush its pending status updates as soon as a new master is detected.
> >     I'm imagining a scenario where the SUM is flushing status updates and the slave
handles a status ACK interleaved with a slave re-registration delivering stale or out-of-sync
task states.
> >     Wouldn't it just be better if the SUM didn't flush until after the slave has
successfully re-registered?
> 
> Vinod Kone wrote:
>     Definitely thought about this race. 
>     
>     Yes, it would be better if SUM did the flush after re-registration but I think it
is still a race because re-registration could happen due to ZK blips where updates and acks
are in flight.
>     
>     I added a comment on why it is safe. Let me know if you still have concerns.

Actually, after thinking more about this and discussing with BenM, there is fundamentally
still a race between slave sending re-registered message and SUM sending its update. To fix
this, I will make couple changes.
1) Instead of SUM directly sending updates to the master, it will send to the slave which
will forward it to the master.
2) Slave updates the latest state of the task when it gets an update from the executor and
unacknowledged state when it gets an update from the SUM.
3) The latest state in StatusUpdate will be set by the slave (instead of SUM) before forwarding
it to the master.

These changes should guarantee that re-registration and update messages are always in sync
w.r.t to the latest and unacknowledged state of a task. As a side benefit of 1), updates won't
be sent to the master when slave is in the middle of re-registration.

Thoughts?


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26699/#review56604
-----------------------------------------------------------


On Oct. 15, 2014, 9:29 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26699/
> -----------------------------------------------------------
> 
> (Updated Oct. 15, 2014, 9:29 p.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, and Niklas Nielsen.
> 
> 
> Bugs: MESOS-1799 and MESOS-1817
>     https://issues.apache.org/jira/browse/MESOS-1799
>     https://issues.apache.org/jira/browse/MESOS-1817
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Slave re-registration now sends both the latest state and unacknowledged state to the
master.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 342b09fc084c20d98d096bb129830440179c092c 
>   src/slave/slave.cpp 0e342ed35e3db3b68f9f32b6cf4ace23e4a4db38 
>   src/tests/fault_tolerance_tests.cpp a75910d4f486230ba3f1d8927e5f1e5fda6e287b 
>   src/tests/slave_tests.cpp f585bdd20ae1af466f2c1b4d85331ac67451552f 
> 
> Diff: https://reviews.apache.org/r/26699/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Ran new test 1000 times.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message