aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: Review Request 62956: Immediately reject offers lacking necessary resources
Date Wed, 18 Oct 2017 19:49:25 GMT


> On Oct. 13, 2017, 1:12 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java
> > Lines 67-68 (patched)
> > <https://reviews.apache.org/r/62956/diff/2/?file=1854107#file1854107line67>
> >
> >     As far as I know this will filter this agent entirely for 30 days. This comes
pretty close to leaking agents. https://github.com/apache/mesos/blob/2fe2bb26a425da9aaf1d7cf34019dd347d0cf9a4/src/master/allocator/mesos/hierarchical.cpp#L1207-L1209
> >     
> >     This implies the timeout would need to be significantly smaller (e.g ~3 minutes)
and configurable for operators. At that point, I am no longer sure the optimization would
help at Twitter-scale clusters.

> this will filter this agent entirely for 30 days

Unfortunately that log statement lies!  The agent is not filtered, but the _resources_ are
filtered for future consideration unless they increase.


> On Oct. 13, 2017, 1:12 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java
> > Lines 220-224 (patched)
> > <https://reviews.apache.org/r/62956/diff/2/?file=1854107#file1854107line220>
> >
> >     This won't work for us.
> >     
> >     We are using both non-revocable and revocable (CPU & RAM) resources. it
is crucial for us that we can still use revocable resources on an agent even if the non-revocable
resources are maxed out. The same applies vice versa. 
> >     
> >     This pseudo code should solve it:
> >     ```
> >     bool lacksUsefulResources(offer):
> >         no_revocable = revocable_mem <= mem_threshold || revocable_cpu <=
cpu_threshold
> >         no_non_revocabe = mem <= mem_threshold || cpu <= cpu_threshold
> >         
> >         return no_revocable and no_non_revocable
> >     ```
> >     
> >     Would that still work for you? 
> >     
> >     
> >     (As a minor improvement of the heuristic we could use the minimal executor resources
as thresholds rather than 0)

I believe `ResourceManager.bagFromMesosResources()` does what you want - the resources are
aggregated without regard for the revocable flag.  I explicitly test for this in `OfferManagerImplTest`;
grep for `mixed` to find the test cases.  If you disagree, can you give me a test cases that
points out the issue?


- Bill


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62956/#review187939
-----------------------------------------------------------


On Oct. 12, 2017, 4:18 p.m., Bill Farner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62956/
> -----------------------------------------------------------
> 
> (Updated Oct. 12, 2017, 4:18 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> There's no reason for us to evaluate offers with no CPUs or memory, so reject them early
in the offer lifecycle.
> 
> This is an incremental performance optimization, but it may net significant improvements
based on observations in some very large clusters.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/http/Utilization.java 3c77e2983ce00f897f3d5ed106b779cd7f7f0940

>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java e8334310a2a46a0ccb09ee6e4122c515892d3996

>   src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilter.java 1b1239753f40d7d46d91724def6c25037eb79f1c

>   src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java d5db81b88a0369d0b26c8fbf70efab3886ad7695

>   src/main/java/org/apache/aurora/scheduler/stats/TaskStatCalculator.java b98aaaf48ae60afef19a368ee96abc897300f8fa

>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 2cfdc090ff75a63111ae146c9fe7b3542e7ac83f

>   src/test/java/org/apache/aurora/scheduler/offers/Offers.java 129b4437315c6ad4ea47ca75d4ae6e28cadd7911

>   src/test/java/org/apache/aurora/scheduler/resources/ResourceTestUtil.java 765a527acb96997989c920be8b69dfa1113dc302

> 
> 
> Diff: https://reviews.apache.org/r/62956/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Bill Farner
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message