Return-Path: X-Original-To: apmail-aurora-reviews-archive@minotaur.apache.org Delivered-To: apmail-aurora-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 76EFA17D1F for ; Mon, 11 May 2015 21:59:47 +0000 (UTC) Received: (qmail 62631 invoked by uid 500); 11 May 2015 21:59:47 -0000 Delivered-To: apmail-aurora-reviews-archive@aurora.apache.org Received: (qmail 62577 invoked by uid 500); 11 May 2015 21:59:47 -0000 Mailing-List: contact reviews-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@aurora.apache.org Delivered-To: mailing list reviews@aurora.apache.org Received: (qmail 62556 invoked by uid 99); 11 May 2015 21:59:47 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 May 2015 21:59:47 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id C8CCD1D7BC4; Mon, 11 May 2015 21:59:47 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============3518815152522995285==" MIME-Version: 1.0 Subject: Re: Review Request 33689: Updated scheduler to process status updates asynchronously in batches. From: "Ben Mahler" To: "Bill Farner" , "Maxim Khutornenko" Cc: "Ben Mahler" , "Stephan Erb" , "Aurora" Date: Mon, 11 May 2015 21:59:47 -0000 Message-ID: <20150511215947.18931.29939@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "Ben Mahler" X-ReviewGroup: Aurora X-ReviewRequest-URL: https://reviews.apache.org/r/33689/ X-Sender: "Ben Mahler" References: <20150510121025.12559.97618@reviews.apache.org> In-Reply-To: <20150510121025.12559.97618@reviews.apache.org> Reply-To: "Ben Mahler" X-ReviewRequest-Repository: aurora --===============3518815152522995285== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On May 10, 2015, 12:10 p.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/UserTaskLauncher.java, line 211 > > > > > > You mentioned that updates should be processed in FIFO order. As we have effectively lost the entire batch here, do we have to clear the pendingUpdates queue to prevent our of order processing? > > Ben Mahler wrote: > Out of order processing occurs when the scheduler processes updates in a different order than the order in which they were delivered (via the statusUpdate() callback). Dropping a batch doesn't cause a re-ordering, unless there's a bug that I'm missing here. Why do you think that ordering is an issue when a batch is dropped? Chatting with Maxim, it probably wasn't clear for you that status updates are not pipelined. That is, only 1 update for a task will be in-flight at a time. Until you acknowledge U1, you will not be sent U2. If this were not the case it would definitely lead to re-odering as you said. When pipelining is added, it will be done in a manner that doesn't break existing schedulers. :) - Ben ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33689/#review83184 ----------------------------------------------------------- On May 11, 2015, 6:55 p.m., Ben Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/33689/ > ----------------------------------------------------------- > > (Updated May 11, 2015, 6:55 p.m.) > > > Review request for Aurora, Maxim Khutornenko and Bill Farner. > > > Bugs: AURORA-1228 > https://issues.apache.org/jira/browse/AURORA-1228 > > > Repository: aurora > > > Description > ------- > > Now the processing of status updates is done asynchronously with batching to insulate throughput from the expensive storage resource. Updates are placed into a queue and consumed by another thread. If many updates arrive while we're storing a batch of updates, these will be processed together in batch rather than individually. > > > Diffs > ----- > > src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 7bb64dd913f0fe2fede95d50a061043dbb794ab4 > src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java 45de15a57baf7a2f7d437b590935714e28777f35 > src/main/java/org/apache/aurora/scheduler/SchedulerModule.java d3ac176e9402a33fd2074b0737313458120da9e2 > src/main/java/org/apache/aurora/scheduler/UserTaskLauncher.java 0ce9c9d4cf75f9add260f285115b1d60786ded57 > src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java 4d589a33a2933b0cb6caf85abfae45c5e635c3ce > src/main/java/org/apache/aurora/scheduler/mesos/Driver.java c7e45a89ceaa2c310feb610091eec0b04187860e > src/main/java/org/apache/aurora/scheduler/mesos/MesosSchedulerImpl.java 9b8ab7c1027731f9d3f6cae77b85272ea63354d4 > src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java da2d5df2e053e6e1b8fb08d6813dff9eac9777f8 > src/test/java/org/apache/aurora/scheduler/UserTaskLauncherTest.java 32432322753799562d671db39c0d7fa308d962ff > src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 422d5a9a42310979752eb7282658316c2b772419 > src/test/java/org/apache/aurora/scheduler/mesos/MesosSchedulerImplTest.java abdeee49858fc439c27911c4eb544bf8e8c931d4 > > Diff: https://reviews.apache.org/r/33689/diff/ > > > Testing > ------- > > Ran the benchmark to confirm that this improves status update throughput substantially: > > Before: Around 100 updates per second for a 5ms storage latency. Much worse for higher latencies. > After: Around 4k-5k updates per second for a 5ms storage latency, down to 3k updates per second for 100ms storage latency. > > Updated unit tests for the new invariants: > > * TaskLaunchers are responsible for acknowledging updates. > * UserTaskLauncher processes updates asynchronously. > > > Thanks, > > Ben Mahler > > --===============3518815152522995285==--