Return-Path: X-Original-To: apmail-aurora-reviews-archive@minotaur.apache.org Delivered-To: apmail-aurora-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 51BA818F12 for ; Thu, 21 Jan 2016 22:19:28 +0000 (UTC) Received: (qmail 82018 invoked by uid 500); 21 Jan 2016 22:19:28 -0000 Delivered-To: apmail-aurora-reviews-archive@aurora.apache.org Received: (qmail 81960 invoked by uid 500); 21 Jan 2016 22:19:28 -0000 Mailing-List: contact reviews-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@aurora.apache.org Delivered-To: mailing list reviews@aurora.apache.org Received: (qmail 81935 invoked by uid 99); 21 Jan 2016 22:19:27 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jan 2016 22:19:27 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 96D82282BAF; Thu, 21 Jan 2016 22:19:26 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============2096215723937261781==" MIME-Version: 1.0 Subject: Re: Review Request 42332: Turn TaskHistoryPruner into a service and trigger shutdown on pruning failure. From: "Maxim Khutornenko" To: "Maxim Khutornenko" , "John Sirois" Cc: "Zameer Manji" , "Aurora" Date: Thu, 21 Jan 2016 22:19:26 -0000 Message-ID: <20160121221926.32039.92891@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "Maxim Khutornenko" X-ReviewGroup: Aurora X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/42332/ X-Sender: "Maxim Khutornenko" References: <20160121185114.32039.20775@reviews.apache.org> In-Reply-To: <20160121185114.32039.20775@reviews.apache.org> Reply-To: "Maxim Khutornenko" X-ReviewRequest-Repository: aurora --===============2096215723937261781== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Jan. 21, 2016, 6:51 p.m., Maxim Khutornenko wrote: > > src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java, lines 152-161 > > > > > > Am I reading this as a busy loop consuming 100% thread CPU waiting for something that may never happen? I don't think this is an acceptable solution. > > > > Perhaps it's time to refactor task prunner into an AbstractScheduledService? I always felt task prunner approach of holding on to task IDs for 2 days just to act once on them isn't very efficient. What if instead of acting on every particular task ID we have a periodic (say every 30 seconds) run loop to prune job keys instead? > > > > Implementation-wise, it could be a Set of unique job keys that we populate on every TaskStateChange event. A runOneIteration() would poll that set and apply both latency and max_per_job thresholds for all related terminal tasks within the same iteration. > > > > The only downside for the above is a somewhat increased history count between the cleanup runs but given that our current thresholds are chosen mostly arbitrarily I think that should be acceptable. > > John Sirois wrote: > I think my Future/Queue suggestion above solves the busy loop with no liveness penalty. That might allow your batching change suggestion to happen in a seperate follow-up RB. > > Zameer Manji wrote: > +1 to John here. I think we are overdue for a less complex and heavy pruner but I would prefer to keep this RB focused on failure propagation. I am open to a follow up ticket and RB. Maxim, if you agree, I can create a ticket that tracks the work you just proposed. > > Right now, I think I will use the Future/Queue suggestion that John has to remove the busy loop. I am fine with the follow up ticket if you feel it's too much to lift within this RB. - Maxim ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/42332/#review115663 ----------------------------------------------------------- On Jan. 20, 2016, 10:39 p.m., Zameer Manji wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/42332/ > ----------------------------------------------------------- > > (Updated Jan. 20, 2016, 10:39 p.m.) > > > Review request for Aurora, John Sirois and Maxim Khutornenko. > > > Bugs: AURORA-1582 > https://issues.apache.org/jira/browse/AURORA-1582 > > > Repository: aurora > > > Description > ------- > > Task pruning is key to operating a large cluster and failure to prune should trigger shutdown to prevent unbounded growth of storage. This patch turns `TaskHistoryPruner` into a service which propagates failure from failed pruning attempts towards the `ServiceManager`. Also completing a TODO which removes a test for behaviour that is very awkward to test for. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/pruning/PruningModule.java 735199ac1ccccab343c24471890aa330d6635c26 > src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 2064089937f5178b1413d386a312f4173a0e35fb > src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 295960f13693c6ba0d7075a8ef7f9680a91ae69d > > Diff: https://reviews.apache.org/r/42332/diff/ > > > Testing > ------- > > ./gradlew build -Pq > e2e tests > > > Thanks, > > Zameer Manji > > --===============2096215723937261781==--