aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs
Date Thu, 09 Nov 2017 23:32:13 GMT


> On Nov. 9, 2017, 9:23 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 76 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885207#file1885207line76>
> >
> >     Thinking out loud - the `Optional.of(PARTITIONED)` parameter is already helping
you perform a CAS.  If we had a _true_ CAS operation, you could very safely avoid the need
to cancel futures.  You can't do that with the status only since the task can now cycle with
`PARTITIONED`.
> >     
> >     What do you think about using the last transition timestamp to do this?
> >     
> >     ```java
> >     long casLastTransitionMs = Tasks.getLatestEvent(task).getTimestamp();
> >     executor.schedule(() -> {
> >       storage.write(... {
> >         if (casLastTransition == Tasks.getLatestEvent(task).getTimestamp()) {
> >           // proceed
> >         } else {
> >           // cas failed, the task has since transitioned
> >         }
> >       }
> >     
> >     }, delay, TimeUnit.SECONDS);
> >     ```
> 
> David McLaughlin wrote:
>     The other motivation (other than correctness, which I'm pretty sure we now have)
for removing and cancelling futures was to get the tasks off the heap. Of course, that's not
actually happening here because I'm not using a ScheduledThreadPoolExecutor (and setting removeOnCancelPolicy
to true). With the AsyncUtil suggestion above, I'll be sure to set that to make it clear what
I'm doing. 
>     
>     If we still think it's necessary to include this timestamp check, I'm happy to do
that (although we'd have to refetch the task from storage inside the storage.write of course).
> 
> Bill Farner wrote:
>     > get the tasks off the heap
>     
>     Smells like premature optimization.  Either, you can certainly factor this so you
only maintain a reference to the task ID and the timestamp.
> 
> David McLaughlin wrote:
>     Isn't premature optimization usually brandished in response to complexity though?
I don't see the complexity here.

I'm simply pointing out that we shouldn't maintain the map if the primary motivation is to
reduce heap consumption.

I do think the code would be easier to reason about if future cancellation and maintaining
the additional map were not necessary.  I don't feel strongly about this, however.


- Bill


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190607
-----------------------------------------------------------


On Nov. 8, 2017, 4:48 p.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> -----------------------------------------------------------
> 
> (Updated Nov. 8, 2017, 4:48 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is my prototype code for adding partition-awareness to Aurora. There is a proposal
document to accompany this here: https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, metrics, logging,
etc.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift c869493c06499340d73e1b219e17a0d7d8b5ead9

>   examples/vagrant/upstart/aurora-scheduler.conf 5ca3caef03b6632cd4dbf47711b1ef183f6a6449

>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 33cc012a2cad929b0dd1ce236597b870cfc5aba0

>   src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java PRE-CREATION

>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java b91a0852e968b4aa9d74801601671cb61af3648a

>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 9989ed441cd9bc442e6472768880ce7924c3bdd9

>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java d72f055749801ee9d6f31f60857cc795d0ed7ab1

>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1

>   src/main/python/apache/aurora/client/cli/jobs.py b79ae56bee0e5692cacf1e66f4a4126b06aaffdc

>   src/main/python/apache/aurora/config/schema/base.py 18ce826363009e1cc0beac5cce4edf42610d487e

>   src/main/python/apache/aurora/config/thrift.py bedf8cd6641e1b1a930602791b758d584af4891c

> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/2/
> 
> 
> Testing
> -------
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED directly
to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a while before
moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED indefinitely)
> 
> I also verified tasks are able to transition to various states (back to RUNNING or moving
to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> ----------------
> 
> Task in PARTITIONED state
>   https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message