aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Ly <>
Subject Review Request 63199: Refactor staticallyBannedOffers into a LRU cache
Date Sat, 21 Oct 2017 05:53:47 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan Erb, and Bill

Repository: aurora


Using the new `hold_offers_forever` option, it is possible for the `staticallyBannedOffers`
to grow very large in size as we never release offers.

As an alternative to, I propose changing `staticallyBannedOffers`
into a LRU cache which expires entries after `min_offer_hold_time` + `offer_hold_jitter_window`
(referred to as `maxOfferHoldTime`), while also taking an option for a maximum size for the
cache. I believe that this approach has a couple of benefits:

1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. Entries will no
longer be removed when the offer is used, but they will be removed within `maxOfferHoldTime`.
This means cluster operators will not have to think about the new `offer_static_ban_cache_max_size`
if they aren't affected by the memory leak now.
2. Cluster operators that use Aurora as a single framework and hold offers indefinitely can
cap the size of the cache to avoid the memory leak.
3. Using an LRU cache greatly benefits quickly recurring crons and job updates.


  src/jmh/java/org/apache/aurora/benchmark/ 5a9099bf9dd292249d72bc3a7604fbb3394f30ea

  src/main/java/org/apache/aurora/scheduler/offers/ 7011a4cc9eea827cdd54698aaed1a653774bce7f

  src/main/java/org/apache/aurora/scheduler/offers/ e060f2073dce4d2486d1ee2bfd873fe75167c6d0

  src/main/java/org/apache/aurora/scheduler/offers/ e6b2c55e4f33f9a603157236766425edcaff10e7

  src/test/java/org/apache/aurora/scheduler/config/ 5b502442163581daa4d7954b09c00bdc3680a726

  src/test/java/org/apache/aurora/scheduler/offers/ 6c8434e9cfe46ef63ff10c6f059ecb99981f29a2



Unit tests pass.
Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` memory leak fixed
with correct options and b) lowered assignment time for quickly recurring crons and rescheduled


Jordan Ly

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message