Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EE404200D28 for ; Mon, 23 Oct 2017 18:39:34 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id ECCC51609E0; Mon, 23 Oct 2017 16:39:34 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3DE7C1609DF for ; Mon, 23 Oct 2017 18:39:34 +0200 (CEST) Received: (qmail 21906 invoked by uid 500); 23 Oct 2017 16:39:33 -0000 Mailing-List: contact reviews-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@aurora.apache.org Delivered-To: mailing list reviews@aurora.apache.org Received: (qmail 21895 invoked by uid 99); 23 Oct 2017 16:39:33 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Oct 2017 16:39:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 4FE5BC00AF; Mon, 23 Oct 2017 16:39:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.451 X-Spam-Level: **** X-Spam-Status: No, score=4.451 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LOTSOFHASH=0.25, NML_ADSP_CUSTOM_MED=1.2, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 8RtTa9DFrqJc; Mon, 23 Oct 2017 16:39:31 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id ED0E15FBE6; Mon, 23 Oct 2017 16:39:30 +0000 (UTC) Received: from reviews.apache.org (unknown [10.41.0.12]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A0DF0E0373; Mon, 23 Oct 2017 16:39:30 +0000 (UTC) Received: from reviews-vm2.apache.org (localhost [IPv6:::1]) by reviews.apache.org (ASF Mail Server at reviews-vm2.apache.org) with ESMTP id 8A708C410EE; Mon, 23 Oct 2017 16:39:30 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============2709236466316378524==" MIME-Version: 1.0 Subject: Re: Review Request 63199: Refactor staticallyBannedOffers into a LRU cache From: Reza Motamedi To: Bill Farner , David McLaughlin , Stephan Erb , Santhosh Kumar Shanmugham Cc: Aurora , Reza Motamedi , Jordan Ly Date: Mon, 23 Oct 2017 16:39:30 -0000 Message-ID: <20171023163930.11020.32675@reviews-vm2.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Reza Motamedi X-ReviewGroup: Aurora X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/63199/ X-Sender: Reza Motamedi References: <20171021055347.61420.80699@reviews-vm2.apache.org> In-Reply-To: <20171021055347.61420.80699@reviews-vm2.apache.org> Reply-To: Reza Motamedi X-ReviewRequest-Repository: aurora archived-at: Mon, 23 Oct 2017 16:39:35 -0000 --===============2709236466316378524== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63199/#review188953 ----------------------------------------------------------- src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java Lines 109 (patched) Would it be easy to find the "correct" value here? The correct value seems to be correlated to a lot of things. How sensitive is the stablity of clusters to this value? - Reza Motamedi On Oct. 21, 2017, 5:53 a.m., Jordan Ly wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/63199/ > ----------------------------------------------------------- > > (Updated Oct. 21, 2017, 5:53 a.m.) > > > Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan Erb, and Bill Farner. > > > Repository: aurora > > > Description > ------- > > Using the new `hold_offers_forever` option, it is possible for the `staticallyBannedOffers` to grow very large in size as we never release offers. > > As an alternative to https://reviews.apache.org/r/63121/, I propose changing `staticallyBannedOffers` into a LRU cache which expires entries after `min_offer_hold_time` + `offer_hold_jitter_window` (referred to as `maxOfferHoldTime`), while also taking an option for a maximum size for the cache. I believe that this approach has a couple of benefits: > > 1. The current behavior of `staticallyBannedOffers` is (kinda) preserved. Entries will no longer be removed when the offer is used, but they will be removed within `maxOfferHoldTime`. This means cluster operators will not have to think about the new `offer_static_ban_cache_max_size` if they aren't affected by the memory leak now. > 2. Cluster operators that use Aurora as a single framework and hold offers indefinitely can cap the size of the cache to avoid the memory leak. > 3. Using an LRU cache greatly benefits quickly recurring crons and job updates. > > > Diffs > ----- > > src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 5a9099bf9dd292249d72bc3a7604fbb3394f30ea > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 7011a4cc9eea827cdd54698aaed1a653774bce7f > src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java e060f2073dce4d2486d1ee2bfd873fe75167c6d0 > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java e6b2c55e4f33f9a603157236766425edcaff10e7 > src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 5b502442163581daa4d7954b09c00bdc3680a726 > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 6c8434e9cfe46ef63ff10c6f059ecb99981f29a2 > > > Diff: https://reviews.apache.org/r/63199/diff/4/ > > > Testing > ------- > > Unit tests pass. > Deployed on a scale test cluster and saw that a) `staticallyBannedOffers` memory leak fixed with correct options and b) lowered assignment time for quickly recurring crons and rescheduled jobs. > > > Thanks, > > Jordan Ly > > --===============2709236466316378524==--