mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anand Mazumdar <mazumdar.an...@gmail.com>
Subject Re: Review Request 48387: Delay before initiating a connection with master.
Date Thu, 09 Jun 2016 19:53:52 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48387/#review136869
-----------------------------------------------------------



Thanks for taking this on. Looks pretty good minus a small bug that can lead to a crash (see
review comments).


src/scheduler/constants.hpp (line 28)
<https://reviews.apache.org/r/48387/#comment201950>

    I won't call it a backoff since it's just a _delay_ i.e. a wait time before the scheduler
library tries to establish a connection with the master.
    
    How about: `DEFAULT_CONNECTION_DELAY_MAX`



src/scheduler/flags.hpp (lines 35 - 36)
<https://reviews.apache.org/r/48387/#comment201962>

    How about:
    
    ```cpp
    The maximum amount of time to wait before trying to initiate a connection with the master.
The library waits for a random amount of time between [0, b], where `b = connection_delay_max`
before initiating a (re-)connection attempt with the master.
    ```



src/scheduler/flags.hpp (line 40)
<https://reviews.apache.org/r/48387/#comment201951>

    s/connection_backoff_max/connection_delay_max



src/scheduler/scheduler.cpp (line 73)
<https://reviews.apache.org/r/48387/#comment201963>

    Let's reorder these as per comments from @haosdent.



src/scheduler/scheduler.cpp (line 196)
<https://reviews.apache.org/r/48387/#comment201974>

    Why do we need this variable? Can't we directly use `flags.connection_delay_max`?



src/scheduler/scheduler.cpp (lines 475 - 476)
<https://reviews.apache.org/r/48387/#comment201979>

    How about:
    
    ```
    // Wait for a random duration between 0 and `flags.connection_delay_max` before (re-)connecting
with the master.
    ```



src/scheduler/scheduler.cpp (line 480)
<https://reviews.apache.org/r/48387/#comment201977>

    How about:
    
    ```cpp
    VLOG(1) << "Waiting for " << delay << " before initiating a (re-)connection
attempt with the master";
    ```



src/scheduler/scheduler.cpp (line 481)
<https://reviews.apache.org/r/48387/#comment201985>

    hmmm, this can crash the library in its current form due to a failed invariant check:
    
    Consider this scenario:
    - A new master is detected. You introduce a random amount of delay in connecting with
this master.
    - The new master fails over and a new master is detected before the `delay` can be invoked.
    - The `delay` for the second master is lesser and the library connects with the master.
    - The original `delay` now fires and we fail this `CHECK` invariant on: https://github.com/apache/mesos/blob/master/src/scheduler/scheduler.cpp#L320
    
    Here is how to get around this:
    
    Let's initialize `connectionId` here and not in `connect()` on L474.
    
    The signature of `connect()` would then change to something like:
    `void connect(const UUID& _connectionId)`
    
    This is how a modified `connect` would look like:
    
    ```cpp
    void connect(const UUID& _connectionId)
    {
      // It is possible that a new master was detected while we were waiting to establish
a connection with the old master.
      if (connectionId != _connectionId) {
        VLOG(1) << "Ignoring connection attempt from stale connection";
        return;
      }
      .... // existing content of `connect()` thereafter.
    ```
    
    Makes sense?



src/scheduler/scheduler.cpp (line 739)
<https://reviews.apache.org/r/48387/#comment201975>

    Kill this.


- Anand Mazumdar


On June 8, 2016, 8:53 a.m., Jose Guilherme Vanz wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48387/
> -----------------------------------------------------------
> 
> (Updated June 8, 2016, 8:53 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, haosdent huang, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The scheduler library has been updated to wait a little deley before
> initiate a connection with the master. The maximum amount of time waited
> by the scheduler is defined by a flag: CONNECTION_BACKOFF_MAX. After
> the master has been detected the scheduler picks a random delay that
> can be between 0 and the CONNECTION_BACKOFF_MAX value. MESOS-5359
> 
> 
> Diffs
> -----
> 
>   src/scheduler/constants.hpp PRE-CREATION 
>   src/scheduler/flags.hpp PRE-CREATION 
>   src/scheduler/scheduler.cpp c79837c93e7329dbfa22e4288b44237f33408ba9 
> 
> Diff: https://reviews.apache.org/r/48387/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jose Guilherme Vanz
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message