aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Ly <jordan....@gmail.com>
Subject Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache
Date Mon, 30 Oct 2017 18:14:49 GMT


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java
> > Lines 321 (patched)
> > <https://reviews.apache.org/r/63199/diff/4/?file=1865170#file1865170line327>
> >
> >     `expireAfterWrite` doesn't result in LRU.  I think you mean `expireAfterAccess`.

LRU is maintained if the operator specifies a maximum size (https://github.com/google/guava/wiki/CachesExplained#size-based-eviction).

I use `expireAfterWrite` because offers will not be valid after `minOfferHoldTime`. I believe
both strategies can exist simultaneously.


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java
> > Lines 42-43 (patched)
> > <https://reviews.apache.org/r/63199/diff/4/?file=1865171#file1865171line42>
> >
> >     How about a `CacheBuilderSpec` to bundle these?

I think that using a `CacheBuilderSpec` would cause some odd string manipulation from our
argument names to the ones `CacheBuilderSpec` uses.

I could just build the `CacheBuilder` in `OfferModule`/`OfferSettings` and pass that into
`OfferManager` instead. Which do you think would be cleaner?


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java
> > Lines 46-50 (patched)
> > <https://reviews.apache.org/r/63199/diff/4/?file=1865171#file1865171line46>
> >
> >     We've developed a [stutter](https://michaelwhatcott.com/go-code-that-stutters/)
over time in this class.  Can you do some cleanup and s/offer// on parameters, fields, and
methods?

Done.


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
> > Lines 104 (patched)
> > <https://reviews.apache.org/r/63199/diff/4/?file=1865172#file1865172line104>
> >
> >     `0` is a valid size for guava's `CacheBuilder#maximumSize`:
> >     > When size is zero, elements will be evicted immediately after being loaded
into the cache. This can be useful in testing, or to disable caching temporarily without a
code change.
> >     
> >     Let's allow it, and introduce a `NotNegativeNumber` counterpart to `PositiveNumber`.

Added.


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
> > Lines 109 (patched)
> > <https://reviews.apache.org/r/63199/diff/4/?file=1865172#file1865172line109>
> >
> >     s/Long/long/

Done.


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
> > Line 178 (original), 188-189 (patched)
> > <https://reviews.apache.org/r/63199/diff/4/?file=1865172#file1865172line188>
> >
> >     Is this needed only because of the default `offerStaticBanCacheMaxSize = Long.MAX_VALUE`?
 The double eviction strategy intuitively seems unnecessary.  I suggest a finite default for
`offerStaticBanCacheMaxSize` (say, 1k) and no time expiry.

Spoke offline, but added a comment to explain the reasoning behind the dual eviction strategy.


> On Oct. 27, 2017, 10:03 p.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java
> > Lines 247-249 (original), 253-255 (patched)
> > <https://reviews.apache.org/r/63199/diff/4/?file=1865174#file1865174line253>
> >
> >     IMHO this test case is no longer interesting now that we have to punch through
with `cleanupStaticBans()`.  The test now reads as "test that clearing a cache clears the
cache".  I suggest removing the test case.

`cleanupWithStaticBans()` only ensures that the `size()` method returns the correct value
by not counting evicted entries (i.e. the entry we just expired in the test will not be counted).
I think this test is still useful as it ensures the eviction on expiration strategy works
as intended. I added a comment to `cleanupWithStaticBans()` for clarity.


- Jordan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/#review189488
-----------------------------------------------------------


On Oct. 30, 2017, 6:14 p.m., Jordan Ly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63199/
> -----------------------------------------------------------
> 
> (Updated Oct. 30, 2017, 6:14 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan Erb,
and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Using the new `hold_offers_forever` option, it is possible for the `staticallyBannedOffers`
to grow very large in size as we never release offers.
> 
> As an alternative to https://reviews.apache.org/r/63121/, I propose changing `staticallyBannedOffers`
into a LRU cache which expires entries after `min_offer_hold_time` + `offer_hold_jitter_window`
(referred to as `maxOfferHoldTime`), while also taking an option for a maximum size for the
cache. I believe that this approach has a couple of benefits:
> 
> 1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. Entries will
no longer be removed when the offer is used, but they will be removed within `maxOfferHoldTime`.
This means cluster operators will not have to think about the new `offer_static_ban_cache_max_size`
if they aren't affected by the memory leak now.
> 2. Cluster operators that use Aurora as a single framework and hold offers indefinitely
can cap the size of the cache to avoid the memory leak.
> 3. Using an LRU cache greatly benefits quickly recurring crons and job updates.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java e0ec793cad05674fb4b65246a6d153521b28b914

>   src/main/java/org/apache/aurora/scheduler/config/validators/NotNegativeNumber.java
PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 7011a4cc9eea827cdd54698aaed1a653774bce7f

>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java e060f2073dce4d2486d1ee2bfd873fe75167c6d0

>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java e6b2c55e4f33f9a603157236766425edcaff10e7

>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 244422c6b3ac6a2f7b4690cdc0f3440170b2567f

>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 3d38a5929a0255a980db30eeca0f059a2029f321

> 
> 
> Diff: https://reviews.apache.org/r/63199/diff/5/
> 
> 
> Testing
> -------
> 
> Unit tests pass.
> Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` memory leak
fixed with correct options and b) lowered assignment time for quickly recurring crons and
rescheduled jobs.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message