mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Yu" <yujie....@gmail.com>
Subject Re: Review Request 30514: Rate limited the removal of slaves failing health checks.
Date Wed, 04 Feb 2015 20:14:17 GMT


> On Feb. 4, 2015, 8:01 p.m., Ben Mahler wrote:
> > src/master/master.cpp, lines 197-246
> > <https://reviews.apache.org/r/30514/diff/5/?file=847032#file847032line197>
> >
> >     It looks like we only log when we're shutting down the slave, when there's no
rate limiter?
> >     
> >     Have you considered having a single shutdown code path? This gets you a single
dispatch, and a single log statement and it looks like it's less control flow?
> >     
> >     ```
> >       void shutdown()
> >       {
> >         if (shuttingDown.isNone()) {
> >           Future<Nothing> future = Nothing();
> >     
> >           if (limiter.isSome()) {
> >             LOG(INFO) << "Scheduling shutdown of slave " << slaveId
> >                       << " due to health check timeout";
> >     
> >             future = limiter.get()->acquire();
> >           }
> >     
> >           shuttingDown = future.onAny(defer(self(), &Self::_shutdown));
> >     
> >           ++metrics->slave_shutdowns_scheduled;
> >         }
> >       }
> >     
> >       void _shutdown()
> >       {
> >         CHECK_SOME(shuttingDown);
> >     
> >         const Future<Nothing>& future = shuttingDown.get();
> >     
> >         CHECK(!future.isFailed()) << future.failure();
> >     
> >         if (future.isReady()) {
> >           LOG(INFO) << "Shutting down slave " << slaveId
> >                     << " due to health check timeout";
> >     
> >           dispatch(master,
> >                    &Master::shutdownSlave,
> >                    slaveId,
> >                    "health check timed out");
> >         } else if (future.isDiscarded()) {
> >           // Shutdown was canceled.
> >           LOG(INFO) << "Canceling shutdown of slave " << slaveId
> >                     << " since a pong is received!";
> >     
> >           ++metrics->slave_shutdowns_canceled;
> >         }
> >     
> >         shuttingDown = None();
> >       }
> >     ```
> 
> Jie Yu wrote:
>     Having both 'future' and 'shuttingDown' in 'shutdown' method looks confusing to me:)
You need to comment about why you need that future. Or, I think having dup code is not a too
bad thing in this case as the logic is more clear IMO.

Another option to avoid dup code is:

1) s/shutdown/scheduleShutdown
2) let 'shutdown' be the actual shutdown (sending message)


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30514/#review71021
-----------------------------------------------------------


On Feb. 4, 2015, 1:51 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30514/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 1:51 a.m.)
> 
> 
> Review request for mesos, Ben Mahler, David Robinson, and Jie Yu.
> 
> 
> Bugs: MESOS-1148
>     https://issues.apache.org/jira/browse/MESOS-1148
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The algorithm is simple. All the slave observers share a rate limiter whose rate is configured
via command line. When a slave times out on health check, a permit is acquired to shutdown
the slave *but* the pings are continued to be sent. If a pong arrives before the permit is
satisifed, the shutdown is cancelled.
> 
> 
> Diffs
> -----
> 
>   src/master/flags.hpp 6c18a1af625311ef149b5877b08f63c2b12c040d 
>   src/master/master.hpp cd37ee9d3c57bcd91f08cd402ec1e4a09d9e42ee 
>   src/master/master.cpp d04b2c4041d8fe8978b877f07579a6f907903e1b 
>   src/tests/partition_tests.cpp fea78016268b007590516798eb30ff423fd0ae58 
>   src/tests/slave_tests.cpp e7e2af63da785644f3f7e6e23607c02be962a2c6 
> 
> Diff: https://reviews.apache.org/r/30514/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Ran the new tests 100 times
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message