aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <ma...@apache.org>
Subject Re: Review Request 46603: Introduce command line option to control the offer filter duration
Date Mon, 25 Apr 2016 17:07:03 GMT


> On April 24, 2016, 3:48 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java, line 51
> > <https://reviews.apache.org/r/46603/diff/2/?file=1358596#file1358596line51>
> >
> >     Does this default value effect the same behavior as before the patch?
> 
> Stephan Erb wrote:
>     Using a default of `0` is indeed a behaviour change. I am happy to discuss if we
want this change or not. 
>     
>     With a timeout of `5` secs (this was the former hardcoded default):
>     
>     * When launching a task, Mesos will only re-offer the unused resources in the offer
after 5 seconds. 
>     * When declining offers in order to merge two offers into one, Mesos will only re-offer
resources of this slave after 5s.
>     
>     With timeout of `0` secs:
>     
>     * The resources can be returned instantly within the next offer-cycle of the Mesos
allocator.
>     
>     We tend to have the problem that a timeout of 5 breaks the maintenance feature for
us. We regularly schedule jobs with #instances > #nodes in the cluster. In this case, all
available offers are quickly depleted and Aurora begins to schedule onto nodes which were
supposed to be put into maintenance mode. Only after the timeout of 5 seonds has passed, Mesos
will re-offer resources to Aurora. I believe we might not be the only one with this problem
and therefore think 0 is a good default.

It would be great to reach out to Mesos folks to better understand the reasons behind chosing
a 5 second default timeout. Last I checked, lower values _may_ result in an increased load
on Mesos master. If that proves to be true I'd prefer holding on to the current behavior as
a safer bet.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/#review130306
-----------------------------------------------------------


On April 23, 2016, 4:35 p.m., Stephan Erb wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46603/
> -----------------------------------------------------------
> 
> (Updated April 23, 2016, 4:35 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Bill Farner.
> 
> 
> Bugs: AURORA-1658
>     https://issues.apache.org/jira/browse/AURORA-1658
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Aurora is declining Mesos offers implicitly when launching a task and explicitly when
compacting multiple offers of a slave into a single one.
> The filter duration instructs Mesos to return the declined resources to us only after
a timeout of X seconds, even if there is no other framework that wants them. If no filter
is supplied, the hardcoded default of 5 seconds would be used.
> 
> By making this value configurable, Aurora can be tuned for either single or multi-framework
deployment.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f

>   src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7

>   src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c

>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303

>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00

>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57

>   src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c

>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1

> 
> Diff: https://reviews.apache.org/r/46603/diff/
> 
> 
> Testing
> -------
> 
> * ./gradlew -Pq build 
> * ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>  
> I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5
instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:
> 
> * 7s startup time for a filter duration of 0 seconds
> * 29s startup time for the hardcoded former default of 5 seconds
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message