Return-Path: X-Original-To: apmail-mesos-dev-archive@www.apache.org Delivered-To: apmail-mesos-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8E3C717AEB for ; Wed, 18 Feb 2015 09:05:11 +0000 (UTC) Received: (qmail 34124 invoked by uid 500); 18 Feb 2015 09:04:55 -0000 Delivered-To: apmail-mesos-dev-archive@mesos.apache.org Received: (qmail 34054 invoked by uid 500); 18 Feb 2015 09:04:55 -0000 Mailing-List: contact dev-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mesos.apache.org Delivered-To: mailing list dev@mesos.apache.org Received: (qmail 34039 invoked by uid 99); 18 Feb 2015 09:04:55 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Feb 2015 09:04:55 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id D7B5B1C006B; Wed, 18 Feb 2015 09:04:53 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============7178259762222073379==" MIME-Version: 1.0 Subject: Re: Review Request 31114: Fixed flaky MasterTest.SlavesEndpointTwoSlaves caused by incorrect FUTURE_PROTOBUF usage. From: "Alexander Rojas" To: "Till Toenshoff" , "Ben Mahler" , "Vinod Kone" Cc: "Alexander Rojas" , "mesos" Date: Wed, 18 Feb 2015 09:04:53 -0000 Message-ID: <20150218090453.21354.58544@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "Alexander Rojas" X-ReviewGroup: mesos X-ReviewRequest-URL: https://reviews.apache.org/r/31114/ X-Sender: "Alexander Rojas" References: <20150217181351.21354.82911@reviews.apache.org> In-Reply-To: <20150217181351.21354.82911@reviews.apache.org> Reply-To: "Alexander Rojas" X-ReviewRequest-Repository: mesos-incubating --===============7178259762222073379== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Feb. 17, 2015, 7:13 p.m., Vinod Kone wrote: > > src/tests/master_tests.cpp, lines 1616-1624 > > > > > > Hmm. I don't understand the fix here. > > > > First off, any expectations should be set *before* the corresponding actions (e.g., StartSlave). > > > > I think you just want to move the AWAIT_READY(slaveRegisteredMessage) before the second StartSlave()? IOW, > > > > ``` > > // Start the first slave and wait for it to be registered. > > Future slave1RegisteredMessage = > > FUTURE_PROTOBUF(SlaveRegisteredMessage(), master.get(), _); > > > > StartSlave(); > > AWAIT_READY(slave1RegisteredMessage); > > > > // Start the second slave and wait for it to be registered. > > Future slave2RegisteredMessage = > > FUTURE_PROTOBUF(SlaveRegisteredMessage(), master.get(), _); > > > > StartSlave(); > > AWAIT_READY(slave2RegisteredMessage); > > ``` > > Alexander Rojas wrote: > That doesn't work, since the future will be ready before the second state starts by the second `SlaveRegisteredMessage` sent to the first slave. If you want to try it, just run `./bin/mesos-tests.sh --gtest_filter="MasterTest.SlavesEndpoint*" --gtest_repeat=1000 --gtest_break_on_failure` on the master and you will see what I mean. > > Alexander Rojas wrote: > Sorry about how short the first message was. You are right in the sense that the expectations should be set before, but they break the test eventually. Assuming the code you posted, consider the following timeline: > > 1. Expectation on `slave1RegisteredMessage` set. > 2. First slave starts, it registers to the server. > 3. The delay for retry registration ([here](https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1092)) is almost 0, slave sends a second registration request to the master. > 4. `AWAIT_READY(slave1RegisteredMessage)`, message arrives to the slave, future set, first slave registered. > 5. The master replies to the request of 3. > 6. Expectation on `slave2RegisteredMessage` set. > 7. Message from master arrives from 3, Future `slave2RegisteredMessage` set. > 8. Second slave starts. > 9. `AWAIT_READY(slave2RegisteredMessage)` doesn't wait, it was fulfilled in 6. Filled with data not related to the second slave. Test fails, we are not sure slave2 is running. > > Vinod Kone wrote: > I see. Can you use a 'Not()' matcher (e.g., Not(slave1.get()) for the 2nd expectation? Thank's for the tip about the 'Not()' matcher. The fix now actually looks elegant. - Alexander ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31114/#review72741 ----------------------------------------------------------- On Feb. 17, 2015, 5:36 p.m., Alexander Rojas wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/31114/ > ----------------------------------------------------------- > > (Updated Feb. 17, 2015, 5:36 p.m.) > > > Review request for mesos, Ben Mahler, Till Toenshoff, and Vinod Kone. > > > Bugs: MESOS-2355 > https://issues.apache.org/jira/browse/MESOS-2355 > > > Repository: mesos-incubating > > > Description > ------- > > Sets the "to" end to all calls to `FUTURE_PROTOBUF` within the MasterTest.SlavesEndpointTwoSlaves. > > Failing to do that causes the test to fail on very rare ocasions, due to the first slave registering twice to the server (this is normal behavior according to [slave.cpp](https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1087) and [master.cpp](https://github.com/apache/mesos/blob/master/src/master/master.cpp#L2912). > > It those ocasions, where the first slave registers twice, the master's sends a reponse to both registration requests before the second slave's turn, thus setting the future which was expecting data for the second slave with incorrect data. > > > Diffs > ----- > > src/tests/master_tests.cpp 18eabd4 > > Diff: https://reviews.apache.org/r/31114/diff/ > > > Testing > ------- > > /bin/mesos-tests.sh --gtest_filter="MasterTest.SlavesEndpoint*" --gtest_repeat=1000 --gtest_break_on_failure > > > Thanks, > > Alexander Rojas > > --===============7178259762222073379==--