mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Yu" <yujie....@gmail.com>
Subject Re: Review Request 30514: Rate limited the removal of slaves failing health checks.
Date Mon, 02 Feb 2015 20:13:27 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30514/#review70603
-----------------------------------------------------------



src/master/flags.hpp
<https://reviews.apache.org/r/30514/#comment115811>

    The maximum rate? Does it make sense to you to rename the flag name as well?
    
    slave_removal_rate_limit?



src/master/master.hpp
<https://reviews.apache.org/r/30514/#comment115816>

    The moment you pass it to the SlaveObserver, the ownership of the pointer is transferred
and you cannot access to that anymore.
    
    I guess here you really don't have many options but using a raw pointer (and let the slave
manages its lifecycle). I guess we did the same thing for Registrar, WhitelistWatcher, Allocator,
etc.



src/master/master.cpp
<https://reviews.apache.org/r/30514/#comment115817>

    See my above comments. You probably want to use the raw pointer and add a NOTE about the
ownership is controlled by the master.
    
    Option<RateLimiter*> _limiter



src/master/master.cpp
<https://reviews.apache.org/r/30514/#comment115849>

    Can we simply the timeout logic and move all the limiter related stuff to `shutdown` function?
    
    ```
    void timeout()
    {
      if (pinged && ++timeouts >= XXX) {
        // Schedule a shutdown of the slave.
        shutdown();
      }
      
      // NOTE: We keep sending PINGs even if a shutdown has
      // been scheduled so that we can cancel the shutdown
      // if the slave becomes reachable again.
      ping();
    }
    
    void shutdown()
    {
      if (limiter.isNone()) {
        LOG(INFO) << "...";
        
        dispatch(master,
                 &Master::shutdownSlave,
                 ...);  
        return;
      }
      
      if (shuttingDown.isSome()) {
        // A shutdown has been scheduled already.
        return;
      }
      
      shuttingDown = limiter.get()->acquire()
        .onAny(defer(self(), &Self::_shutdown, ...);
    }
    
    void _shutdown(const Future<Nothing> future)
    {
      if (future.isReady()) {
        // Shutdown the slave.
      } else if (future.isFailed()) {
        // Not expected to happen?
      } else if (future.isDiscarded()) {
        // Do nothing here, maybe print a message.
      }
      
      shuttingDown = None();
    }
    ```



src/master/master.cpp
<https://reviews.apache.org/r/30514/#comment115842>

    Could you add a NOTE about this function:
    
    ```
    NOTE: This function schedules a shutdown of the slave. The shutdown of a slave is rate
limited (MESOS-XXXX). The shutdown can be cancelled if a pong is received before the actually
shutdown is called.
    ```



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/30514/#comment115852>

    indent?



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/30514/#comment115851>

    indent?


- Jie Yu


On Feb. 2, 2015, 6:51 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30514/
> -----------------------------------------------------------
> 
> (Updated Feb. 2, 2015, 6:51 p.m.)
> 
> 
> Review request for mesos, Ben Mahler, David Robinson, and Jie Yu.
> 
> 
> Bugs: MESOS-1148
>     https://issues.apache.org/jira/browse/MESOS-1148
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The algorithm is simple. All the slave observers share a rate limiter whose rate is configured
via command line. When a slave times out on health check, a permit is acquired to shutdown
the slave *but* the pings are continued to be sent. If a pong arrives before the permit is
satisifed, the shutdown is cancelled.
> 
> 
> Diffs
> -----
> 
>   src/master/flags.hpp 6c18a1af625311ef149b5877b08f63c2b12c040d 
>   src/master/master.hpp 337e00aa46ea127f3667e3383d631c3fb8e22f30 
>   src/master/master.cpp 10056861b95ed9453c971787982db7d09f09f323 
>   src/tests/partition_tests.cpp fea78016268b007590516798eb30ff423fd0ae58 
>   src/tests/slave_tests.cpp e7e2af63da785644f3f7e6e23607c02be962a2c6 
> 
> Diff: https://reviews.apache.org/r/30514/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Ran the new tests 100 times
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message