Return-Path: X-Original-To: apmail-aurora-reviews-archive@minotaur.apache.org Delivered-To: apmail-aurora-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 747A318ABB for ; Thu, 7 May 2015 00:27:06 +0000 (UTC) Received: (qmail 23606 invoked by uid 500); 7 May 2015 00:27:06 -0000 Delivered-To: apmail-aurora-reviews-archive@aurora.apache.org Received: (qmail 23559 invoked by uid 500); 7 May 2015 00:27:06 -0000 Mailing-List: contact reviews-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@aurora.apache.org Delivered-To: mailing list reviews@aurora.apache.org Received: (qmail 23537 invoked by uid 99); 7 May 2015 00:27:06 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 May 2015 00:27:06 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 63A571DD1F4; Thu, 7 May 2015 00:27:07 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============8823063275675419446==" MIME-Version: 1.0 Subject: Re: Review Request 33689: Updated scheduler to process status updates asynchronously in batches. From: "Ben Mahler" To: "Bill Farner" , "Maxim Khutornenko" Cc: "Aurora" , "Aurora ReviewBot" , "Ben Mahler" , "Stephan Erb" Date: Thu, 07 May 2015 00:27:07 -0000 Message-ID: <20150507002707.1563.71017@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "Ben Mahler" X-ReviewGroup: Aurora X-ReviewRequest-URL: https://reviews.apache.org/r/33689/ X-Sender: "Ben Mahler" References: <20150506233450.1562.55929@reviews.apache.org> In-Reply-To: <20150506233450.1562.55929@reviews.apache.org> Reply-To: "Ben Mahler" X-ReviewRequest-Repository: aurora --===============8823063275675419446== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33689/ ----------------------------------------------------------- (Updated May 7, 2015, 12:27 a.m.) Review request for Aurora, Maxim Khutornenko and Bill Farner. Changes ------- Fixed a style regression. Bugs: AURORA-1228 https://issues.apache.org/jira/browse/AURORA-1228 Repository: aurora Description ------- Now the processing of status updates is done asynchronously with batching to insulate throughput from the expensive storage resource. Updates are placed into a queue and consumed by another thread. If many updates arrive while we're storing a batch of updates, these will be processed together in batch rather than individually. Diffs (updated) ----- src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 7bb64dd913f0fe2fede95d50a061043dbb794ab4 src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java 45de15a57baf7a2f7d437b590935714e28777f35 src/main/java/org/apache/aurora/scheduler/SchedulerModule.java d3ac176e9402a33fd2074b0737313458120da9e2 src/main/java/org/apache/aurora/scheduler/UserTaskLauncher.java 0ce9c9d4cf75f9add260f285115b1d60786ded57 src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 4d589a33a2933b0cb6caf85abfae45c5e635c3ce src/main/java/org/apache/aurora/scheduler/mesos/Driver.java c7e45a89ceaa2c310feb610091eec0b04187860e src/main/java/org/apache/aurora/scheduler/mesos/MesosSchedulerImpl.java 9b8ab7c1027731f9d3f6cae77b85272ea63354d4 src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java da2d5df2e053e6e1b8fb08d6813dff9eac9777f8 src/test/java/org/apache/aurora/scheduler/UserTaskLauncherTest.java 32432322753799562d671db39c0d7fa308d962ff src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 422d5a9a42310979752eb7282658316c2b772419 src/test/java/org/apache/aurora/scheduler/mesos/MesosSchedulerImplTest.java abdeee49858fc439c27911c4eb544bf8e8c931d4 Diff: https://reviews.apache.org/r/33689/diff/ Testing ------- Ran the benchmark to confirm that this improves status update throughput substantially: Before: Around 100 updates per second for a 5ms storage latency. Much worse for higher latencies. After: Around 4k-5k updates per second for a 5ms storage latency, down to 3k updates per second for 100ms storage latency. Updated unit tests for the new invariants: * TaskLaunchers are responsible for acknowledging updates. * UserTaskLauncher processes updates asynchronously. Thanks, Ben Mahler --===============8823063275675419446==--