Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 056F7200D43 for ; Tue, 21 Nov 2017 20:00:59 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 03D6C160BFC; Tue, 21 Nov 2017 19:00:59 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 227D9160BE3 for ; Tue, 21 Nov 2017 20:00:57 +0100 (CET) Received: (qmail 65365 invoked by uid 500); 21 Nov 2017 19:00:57 -0000 Mailing-List: contact reviews-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@aurora.apache.org Delivered-To: mailing list reviews@aurora.apache.org Received: (qmail 65354 invoked by uid 99); 21 Nov 2017 19:00:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Nov 2017 19:00:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 366301A0FDA; Tue, 21 Nov 2017 19:00:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.701 X-Spam-Level: **** X-Spam-Status: No, score=4.701 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_REPLYTO_END_DIGIT=0.25, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LOTSOFHASH=0.25, NML_ADSP_CUSTOM_MED=1.2, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 2NR87fMoJBDa; Tue, 21 Nov 2017 19:00:53 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id D6CC45FDE7; Tue, 21 Nov 2017 19:00:52 +0000 (UTC) Received: from reviews.apache.org (unknown [10.41.0.12]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 42A6AE0383; Tue, 21 Nov 2017 19:00:52 +0000 (UTC) Received: from reviews-vm2.apache.org (localhost [IPv6:::1]) by reviews.apache.org (ASF Mail Server at reviews-vm2.apache.org) with ESMTP id 3D3B0C4010B; Tue, 21 Nov 2017 19:00:52 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============5352369408324667390==" MIME-Version: 1.0 Subject: Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs From: Jordan Ly To: Bill Farner , Santhosh Kumar Shanmugham , Jordan Ly Cc: Aurora , Stephan Erb , Aurora ReviewBot , David McLaughlin Date: Tue, 21 Nov 2017 19:00:52 -0000 Message-ID: <20171121190052.44937.30282@reviews-vm2.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Jordan Ly X-ReviewGroup: Aurora X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/63536/ X-Sender: Jordan Ly References: <20171121175445.44937.8653@reviews-vm2.apache.org> In-Reply-To: <20171121175445.44937.8653@reviews-vm2.apache.org> X-ReviewBoard-Diff-For: src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java X-ReviewBoard-Diff-For: src/test/sh/org/apache/aurora/e2e/partition_aware.aurora X-ReviewBoard-Diff-For: src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java Reply-To: Jordan Ly X-ReviewRequest-Repository: aurora archived-at: Tue, 21 Nov 2017 19:00:59 -0000 --===============5352369408324667390== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Nov. 21, 2017, 5:54 p.m., David McLaughlin wrote: > > The outstanding feedback is: > > > > * Discussion on using a mock with capture (my preference) vs using a fake implementation. > > * Concern over how to handle a partition during a transient state. My preference is to move immediately to LOST (existing behavior). > > * Confusion over what Disable meant (now removed). > > > > I would appreciate other thoughts on these, as I'm eager to close this work out and move on to other things. > > Stephan Erb wrote: > https://github.com/apache/aurora/blob/master/docs/reference/task-lifecycle.md will also need an update. I am OK if you ignore the image as it is already outdated, but the text should reflect reality. My thoughts: * I think that the current tests are alright -- they test the behavior of PartitionManager in isolation and are fairly easy to read. I imagine that you mean 'integration test' by 'fake implementation', but please correct me if I am mistaken. On a side note: do we ever explicitly test that something is never called (i.e. `expect(...).times(0)`)? I know EasyMock's default behavior is that if something is called that isn't supposed to be called, it will fail. However, when reading tests, I feel like it the explicit declaration flows smoother in my mind. * +1 for the existing behavior. * +1 to removing Disable - Jordan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63536/#review191603 ----------------------------------------------------------- On Nov. 21, 2017, 6:42 p.m., David McLaughlin wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/63536/ > ----------------------------------------------------------- > > (Updated Nov. 21, 2017, 6:42 p.m.) > > > Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill Farner. > > > Repository: aurora > > > Description > ------- > > This is my prototype code for adding partition-awareness to Aurora. There is a proposal document to accompany this here: https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit# > > I'd like feedback on the high-level approach before adding unit tests, metrics, logging, etc. > > > Diffs > ----- > > RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 > api/src/main/thrift/org/apache/aurora/gen/api.thrift 1d369263d779b549b9304018437f535ddc855966 > docs/reference/configuration.md f647bcf86e84c3b08848e5f54f6a5ad95d55fc8a > docs/reference/scheduler-configuration.md d17541f9458650240983276b69f749a854057aa8 > docs/reference/task-lifecycle.md 8ec0077f7c5caef9c012db02ffd40c704fc2b347 > examples/vagrant/upstart/aurora-scheduler.conf 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 > src/main/java/org/apache/aurora/scheduler/base/Conversions.java 33cc012a2cad929b0dd1ce236597b870cfc5aba0 > src/main/java/org/apache/aurora/scheduler/base/Jobs.java 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 > src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 7c223eae69503fe1d5bf34c430438637abcbcb9b > src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java 5e83b2acdde7198d16427d4031e9772f78612554 > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java PRE-CREATION > src/main/java/org/apache/aurora/scheduler/state/SideEffect.java b91a0852e968b4aa9d74801601671cb61af3648a > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 9989ed441cd9bc442e6472768880ce7924c3bdd9 > src/main/java/org/apache/aurora/scheduler/state/StateModule.java c03fff11ea3a4086f9daaa8b07315006c1b481e4 > src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 > src/main/python/apache/aurora/client/cli/jobs.py b79ae56bee0e5692cacf1e66f4a4126b06aaffdc > src/main/python/apache/aurora/config/schema/base.py 18ce826363009e1cc0beac5cce4edf42610d487e > src/main/python/apache/aurora/config/thrift.py bedf8cd6641e1b1a930602791b758d584af4891c > src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 13f656f241a8a9a3d339f4053f165070c2669ef3 > src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java c2d875bb5c393dd95d75251fe86dc649ceba7bd9 > src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java 7b0429109e9a7795e559db264e7737fc55ff0169 > src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java PRE-CREATION > src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 0366cd6e9ddba0c3b9c88ffb50738767a352a17a > src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 8d6c3fff0af2df39bb929f760b862a2edf5d6fca > src/test/python/apache/aurora/client/cli/test_task.py 186cb2737ba8e169819b7d54f86a7344a669b6cb > src/test/python/apache/aurora/config/test_thrift.py 7a1567a9b67917072bb0ba3eea5857e968375f4d > src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION > src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh f0819fb7ac758ad1229a76fd9794b393400e9f63 > > > Diff: https://reviews.apache.org/r/63536/diff/10/ > > > Testing > ------- > > Manual testing in Vagrant by stopping and starting the Mesos agent. With three jobs: > > 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED directly to LOST) > 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a while before moving to LOST) > 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED indefinitely) > > I also verified tasks are able to transition to various states (back to RUNNING or moving to FAILED, etc.) when you turn the agent back on. > > > File Attachments > ---------------- > > Task in PARTITIONED state > https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png > Task back into running when partition resolved > https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png > Compaction of PARTITIONED cycles (note timestamps) > https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png > > > Thanks, > > David McLaughlin > > --===============5352369408324667390==--