mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Mahler" <benjamin.mah...@gmail.com>
Subject Re: Review Request 26846: Updated status update manager to forward updates via slave.
Date Fri, 17 Oct 2014 18:23:39 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26846/#review57168
-----------------------------------------------------------


Looks great!

Main two comments from below are:

(1) Could we now removing the SUM <-> Slave coupling by passing a function to the SUM,
instead of the pid? (scheduler.hpp and low_level_scheduler_libprocess.cpp will provide an
example).

(2) Do we want to introduce `pause()` and `resume()` (as opposed to just `flush()`) on the
SUM to avoid all the dropped update logging for disconnected slaves?


src/master/master.cpp
<https://reviews.apache.org/r/26846/#comment97671>

    Should we have a TODO here to start using 'from' in a later version (0.22.0)? Is there
value in keeping the 'pid' field now that it's sent via the slave, and acks go through the
master?



src/slave/slave.cpp
<https://reviews.apache.org/r/26846/#comment97690>

    Hm.. this was a bit tricky to grasp, should this be a CHECK now, to reflect that a 'cleanup'
slave should not be (re-)registering?



src/slave/slave.cpp
<https://reviews.apache.org/r/26846/#comment97698>

    In line with my other comments, do we want an equivalent `pause()` ability on the SUM
to avoid all the dropped logging?
    
    At which point `flush()` would become `resume()`.



src/slave/slave.cpp
<https://reviews.apache.org/r/26846/#comment97697>

    Should there be any comment here about the case when the slave already received an ack
for this particular update (because of the race)?



src/slave/slave.cpp
<https://reviews.apache.org/r/26846/#comment97695>

    Any value to this full state CHECK? Or is it for consistency?



src/slave/slave.cpp
<https://reviews.apache.org/r/26846/#comment97694>

    Is this really a WARNING? Sounds like it's going to happen a lot..?
    
    Will this dominate the logs for disconnected slaves?



src/slave/slave.cpp
<https://reviews.apache.org/r/26846/#comment97700>

    Hm.. maybe do a fresh re-work of this entire comment to avoid the redundant mention of
the slave / framework terminating cases?



src/slave/slave.cpp
<https://reviews.apache.org/r/26846/#comment97702>

    Maybe this conditional is a bit more intuitive the other way?
    
    if (state == TERMINATING && frameworks.empty()) {
      ...
    }
    
    Because it seems that one thinks of this as "we're terminating, and we've removed the
last framework as part of our termination":
    
    ```
    // Still terminating..
    state==TERMINATING   frameworks.empty()==false
    // Still terminating..
    state==TERMINATING   frameworks.empty()==false
    // Done!
    state==TERMINATING   frameworks.empty()==true
    ```
    
    Or am I misinterpreting how to think about this?



src/slave/slave.cpp
<https://reviews.apache.org/r/26846/#comment97705>

    Should this be an else if with an else that fails due to unknown recover flag?
    
    Or a CHECK inside the else that recover == "cleanup"?
    
    Just want to enforce the assumption that it's "cleanup" here.



src/slave/slave.cpp
<https://reviews.apache.org/r/26846/#comment97706>

    Thank you for the comment about non-local reasoning needed here :)
    
    Maybe add a tiny bit about the timeout? Or say that the executor shutdown "process" began
in `_recover()`? The part that is still a bit non-local IMO is that it's the signal _and_
the timeout that guarantee slave shutdown completes?



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/26846/#comment97689>

    Do we still need to pass `flags` during initialize or can it be done in the SUM constructor?



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/26846/#comment97688>

    I'm curious, could we start reducing the coupling to the `Slave` here?
    
    Now that we no longer have the hard dependency on the slave `PID` for status update messages,
I'm thinking we can just pass in a function callback for forwarding:
    
    ```
    SUM::initialize(const function<void(const StatusUpdate&)>& forward);
    
    Slave::initialize()
    {
      ...
      sum->initialize(defer(self(), &Slave::forward, lambda::_1));
      ...
    }
    ```
    
    In this first step, the SUM no longer needs to know anything about `Slave`.
    
    As a second step, we could remove the need for `initialize` entirely by providing a streaming
updates method, which would leverage a Stream abstraction in libprocess:
    
    ```
    Stream<StatusUpdate> SUM::updateStream();
    
    Slave::initialize()
    {
      ...
      sum->updateStream()
        .next(defer(self(), &Slave::forward, lambda::_1));
      ...
    }
    
    Slave::forward(Stream<StatusUpdate> updates)
    {
      ...
    
      updates
        .next(defer(self(), &Slave::forward, lambda::_1));
    }
    ```
    
    But this one is distant. :)
    
    Thoughts? Probably drop some TODOs but the first one would be nice to tackle. :)


- Ben Mahler


On Oct. 17, 2014, 12:24 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26846/
> -----------------------------------------------------------
> 
> (Updated Oct. 17, 2014, 12:24 a.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, and Niklas Nielsen.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> All the updates from the status update manager are funnelled via the slave. No more sending
updates in the middle of recovery or disconnection. Yay.
> 
> Had to cleanup flags.recover = "cleanup" semantics because the updates are no longer
sent when the slave is disconnected. 
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 0a5c9a374062a241c90ea238725fbb8dd2408ef4 
>   src/slave/slave.hpp 342b09fc084c20d98d096bb129830440179c092c 
>   src/slave/slave.cpp 0e342ed35e3db3b68f9f32b6cf4ace23e4a4db38 
>   src/slave/status_update_manager.hpp 24e3882ad1969c20a64daf90e408618c310e9a81 
>   src/slave/status_update_manager.cpp 5d5cf234ef2dd47fa4b1f67be761dbca31659451 
>   src/tests/slave_recovery_tests.cpp 4fb357bd55f69f71193e92fd03765b808f932d33 
> 
> Diff: https://reviews.apache.org/r/26846/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message