aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Ly <jordan....@gmail.com>
Subject Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache
Date Mon, 30 Oct 2017 18:14:40 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63199/
-----------------------------------------------------------

(Updated Oct. 30, 2017, 6:14 p.m.)


Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan Erb, and Bill
Farner.


Changes
-------

Address feedback.


Repository: aurora


Description
-------

Using the new `hold_offers_forever` option, it is possible for the `staticallyBannedOffers`
to grow very large in size as we never release offers.

As an alternative to https://reviews.apache.org/r/63121/, I propose changing `staticallyBannedOffers`
into a LRU cache which expires entries after `min_offer_hold_time` + `offer_hold_jitter_window`
(referred to as `maxOfferHoldTime`), while also taking an option for a maximum size for the
cache. I believe that this approach has a couple of benefits:

1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. Entries will no
longer be removed when the offer is used, but they will be removed within `maxOfferHoldTime`.
This means cluster operators will not have to think about the new `offer_static_ban_cache_max_size`
if they aren't affected by the memory leak now.
2. Cluster operators that use Aurora as a single framework and hold offers indefinitely can
cap the size of the cache to avoid the memory leak.
3. Using an LRU cache greatly benefits quickly recurring crons and job updates.


Diffs (updated)
-----

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java e0ec793cad05674fb4b65246a6d153521b28b914

  src/main/java/org/apache/aurora/scheduler/config/validators/NotNegativeNumber.java PRE-CREATION

  src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 7011a4cc9eea827cdd54698aaed1a653774bce7f

  src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java e060f2073dce4d2486d1ee2bfd873fe75167c6d0

  src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java e6b2c55e4f33f9a603157236766425edcaff10e7

  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 244422c6b3ac6a2f7b4690cdc0f3440170b2567f

  src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 3d38a5929a0255a980db30eeca0f059a2029f321



Diff: https://reviews.apache.org/r/63199/diff/5/

Changes: https://reviews.apache.org/r/63199/diff/4-5/


Testing
-------

Unit tests pass.
Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` memory leak fixed
with correct options and b) lowered assignment time for quickly recurring crons and rescheduled
jobs.


Thanks,

Jordan Ly


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message