aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Ly <jordan....@gmail.com>
Subject Re: Review Request 63763: Fix flaky MesosCallbackHandlerTest
Date Mon, 13 Nov 2017 23:45:29 GMT


> On Nov. 13, 2017, 9:43 p.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java
> > Line 335 (original), 337 (patched)
> > <https://reviews.apache.org/r/63763/diff/1/?file=1890883#file1890883line339>
> >
> >     Can we get away with `MoreExecutors.directExecutor()` instead?
> 
> Jordan Ly wrote:
>     I don't believe so. This test depends on a delay between `execute(...)` and the runnable
being executed. Is there some way to control when `directExecutor()` executes actions (ie.
can it execute actions not immediately)?
> 
> Bill Farner wrote:
>     How about use `directExecutor()` and just move this line
>     ```java
>     handler.handleRescind(OFFER_ID);
>     ```
>     
>     before this one
>     ```java
>     handler.handleOffers(ImmutableList.of(HOST_OFFER.getOffer()));
>     ```
>     
>     The behavior is ~identical, and as a result the test body reads in a way that matches
the test name.
>     
>     I'd also advocate for the removal of the strict mock.

I think this test is a bit complex and may require additional documentation, but it is testing
some nuanced interaction between Mesos and Aurora that does not allow the reordering of handleOffers
and handleRescind.

Mesos guarantees ordered message delivery. Therefore, it should be impossible for us to see
`handleOffers` with ID_A after a `handleRescind` for ID_A. 

We try to process rescinds synchronously to avoid situations where a rescind could come in
but have to wait on the executor queue and the offer that is rescinded is used in the interim.
However, it is also possible to have the offer to be rescinded still be on the queue to be
added when the rescind come in. Therefore, the rescind may not see the offer to remove. To
solve the latter scenario, we want to ban the offer that has arrived from being used (hence
`globallyBannedOffers`) immediately, and add an unban on the executor queue later (to keep
the banned offer size from expanding).

This test handles this complex scenario which requires us to have access on when individual
actions on the executor queue are run.

Some context: https://reviews.apache.org/r/61804/

I use a `strictMock` since I expect an exact ordering of the calls. I could possibly add additional
stats within handleRescind that I can check when the asynchronous unban happens if that is
preferable?


- Jordan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63763/#review190886
-----------------------------------------------------------


On Nov. 13, 2017, 7:32 p.m., Jordan Ly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63763/
> -----------------------------------------------------------
> 
> (Updated Nov. 13, 2017, 7:32 p.m.)
> 
> 
> Review request for Aurora and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> In the same vein as: https://reviews.apache.org/r/63760/
> 
> Fix a flaky test that uses `Thread.sleep` by injecting a fake Executor.
> 
> 
> Diffs
> -----
> 
>   src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java 8f8b86dfb5d53439671ca59e6c42245b31fc6136

> 
> 
> Diff: https://reviews.apache.org/r/63763/diff/1/
> 
> 
> Testing
> -------
> 
> `./gradlew test`
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message