mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Rojas" <alexan...@mesosphere.io>
Subject Re: Review Request 31114: Fixed flaky MasterTest.SlavesEndpointTwoSlaves caused by incorrect FUTURE_PROTOBUF usage.
Date Wed, 18 Feb 2015 09:04:53 GMT


> On Feb. 17, 2015, 7:13 p.m., Vinod Kone wrote:
> > src/tests/master_tests.cpp, lines 1616-1624
> > <https://reviews.apache.org/r/31114/diff/2/?file=866348#file866348line1616>
> >
> >     Hmm. I don't understand the fix here.
> >     
> >     First off, any expectations should be set *before* the corresponding actions
(e.g., StartSlave).
> >     
> >     I think you just want to move the AWAIT_READY(slaveRegisteredMessage) before
the second StartSlave()? IOW,
> >     
> >     ```
> >     // Start the first slave and wait for it to be registered.
> >     Future<SlaveRegisteredMessage> slave1RegisteredMessage =
> >       FUTURE_PROTOBUF(SlaveRegisteredMessage(), master.get(), _);
> >     
> >     StartSlave();
> >     AWAIT_READY(slave1RegisteredMessage);
> >     
> >     // Start the second slave and wait for it to be registered.
> >     Future<SlaveRegisteredMessage> slave2RegisteredMessage =
> >       FUTURE_PROTOBUF(SlaveRegisteredMessage(), master.get(), _);
> >     
> >     StartSlave();
> >     AWAIT_READY(slave2RegisteredMessage);
> >     ```
> 
> Alexander Rojas wrote:
>     That doesn't work, since the future will be ready before the second state starts
by the second `SlaveRegisteredMessage` sent to the first slave. If you want to try it, just
run `./bin/mesos-tests.sh --gtest_filter="MasterTest.SlavesEndpoint*" --gtest_repeat=1000
--gtest_break_on_failure` on the master and you will see what I mean.
> 
> Alexander Rojas wrote:
>     Sorry about how short the first message was. You are right in the sense that the
expectations should be set before, but they break the test eventually. Assuming the code you
posted, consider the following timeline:
>     
>     1. Expectation on `slave1RegisteredMessage` set.
>     2. First slave starts, it registers to the server.
>     3. The delay for retry registration ([here](https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1092))
is almost 0, slave sends a second registration request to the master.
>     4. `AWAIT_READY(slave1RegisteredMessage)`, message arrives to the slave, future set,
first slave registered.
>     5. The master replies to the request of 3.
>     6. Expectation on `slave2RegisteredMessage` set.
>     7. Message from master arrives from 3, Future `slave2RegisteredMessage` set.
>     8. Second slave starts.
>     9. `AWAIT_READY(slave2RegisteredMessage)` doesn't wait, it was fulfilled in 6. Filled
with data not related to the second slave. Test fails, we are not sure slave2 is running.
> 
> Vinod Kone wrote:
>     I see. Can you use a 'Not()' matcher (e.g., Not(slave1.get()) for the 2nd expectation?

Thank's for the tip about the 'Not()' matcher. The fix now actually looks elegant.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31114/#review72741
-----------------------------------------------------------


On Feb. 17, 2015, 5:36 p.m., Alexander Rojas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31114/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2015, 5:36 p.m.)
> 
> 
> Review request for mesos, Ben Mahler, Till Toenshoff, and Vinod Kone.
> 
> 
> Bugs: MESOS-2355
>     https://issues.apache.org/jira/browse/MESOS-2355
> 
> 
> Repository: mesos-incubating
> 
> 
> Description
> -------
> 
> Sets the "to" end to all calls to `FUTURE_PROTOBUF` within the MasterTest.SlavesEndpointTwoSlaves.

> 
> Failing to do that causes the test to fail on very rare ocasions, due to the first slave
registering twice to the server (this is normal behavior according to [slave.cpp](https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1087)
and [master.cpp](https://github.com/apache/mesos/blob/master/src/master/master.cpp#L2912).
> 
> It those ocasions, where the first slave registers twice, the master's sends a reponse
to both registration requests before the second slave's turn, thus setting the future which
was expecting data for the second slave with incorrect data.
> 
> 
> Diffs
> -----
> 
>   src/tests/master_tests.cpp 18eabd4 
> 
> Diff: https://reviews.apache.org/r/31114/diff/
> 
> 
> Testing
> -------
> 
> /bin/mesos-tests.sh --gtest_filter="MasterTest.SlavesEndpoint*" --gtest_repeat=1000 --gtest_break_on_failure
> 
> 
> Thanks,
> 
> Alexander Rojas
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message