aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Ly <jordan....@gmail.com>
Subject Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs
Date Thu, 16 Nov 2017 22:22:14 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191256
-----------------------------------------------------------


Fix it, then Ship it!




Overall LGTM! You may want to add some entry to the release notes explaining how Mesos should
fix some bugs on their end before enabling this.


src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
Lines 90-91 (patched)
<https://reviews.apache.org/r/63536/#comment268977>

    Can this just be:
    ```
    if (stateChange.getNewState().equals(ScheduleStatus.PARTITIONED))
    ```



src/main/python/apache/aurora/config/schema/base.py
Lines 157 (patched)
<https://reviews.apache.org/r/63536/#comment268978>

    Can you explain what `Disable` does? If it is `RescheduleImmediately`, then the default
0 on `PartitionPolicy` is sufficient.


- Jordan Ly


On Nov. 16, 2017, 1:54 a.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> -----------------------------------------------------------
> 
> (Updated Nov. 16, 2017, 1:54 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is my prototype code for adding partition-awareness to Aurora. There is a proposal
document to accompany this here: https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, metrics, logging,
etc.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 1d369263d779b549b9304018437f535ddc855966

>   examples/vagrant/upstart/aurora-scheduler.conf 5ca3caef03b6632cd4dbf47711b1ef183f6a6449

>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 33cc012a2cad929b0dd1ce236597b870cfc5aba0

>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36

>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 7c223eae69503fe1d5bf34c430438637abcbcb9b

>   src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java PRE-CREATION

>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java b91a0852e968b4aa9d74801601671cb61af3648a

>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 9989ed441cd9bc442e6472768880ce7924c3bdd9

>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java c03fff11ea3a4086f9daaa8b07315006c1b481e4

>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1

>   src/main/python/apache/aurora/client/cli/jobs.py b79ae56bee0e5692cacf1e66f4a4126b06aaffdc

>   src/main/python/apache/aurora/config/schema/base.py 18ce826363009e1cc0beac5cce4edf42610d487e

>   src/main/python/apache/aurora/config/thrift.py bedf8cd6641e1b1a930602791b758d584af4891c

>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 13f656f241a8a9a3d339f4053f165070c2669ef3

>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java c2d875bb5c393dd95d75251fe86dc649ceba7bd9

>   src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java PRE-CREATION

>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 0366cd6e9ddba0c3b9c88ffb50738767a352a17a

>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 8d6c3fff0af2df39bb929f760b862a2edf5d6fca

>   src/test/python/apache/aurora/client/cli/test_task.py 186cb2737ba8e169819b7d54f86a7344a669b6cb

>   src/test/python/apache/aurora/config/test_thrift.py 7a1567a9b67917072bb0ba3eea5857e968375f4d

>   src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh f0819fb7ac758ad1229a76fd9794b393400e9f63

> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/6/
> 
> 
> Testing
> -------
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED directly
to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a while before
moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED indefinitely)
> 
> I also verified tasks are able to transition to various states (back to RUNNING or moving
to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> ----------------
> 
> Task in PARTITIONED state
>   https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message