aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David McLaughlin" <da...@dmclaughlin.com>
Subject Re: Review Request 30225: Modifying update controller to support heartbeats.
Date Thu, 29 Jan 2015 00:13:34 GMT


> On Jan. 28, 2015, 7:41 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
lines 259-263
> > <https://reviews.apache.org/r/30225/diff/1/?file=832014#file832014line259>
> >
> >     I am unsure why this is being called inside pulse. Once pulse is activated,
only the absence of a pulse can modify the update, right? We don't resume a paused update
by receiving a pulse. 
> >     
> >     So surely the last pulse time would be checked externally to the method that
performs the pulse? 
> >     
> >     If we can remove this, you can get rid of the write lock completely here, because
all you need are strongly consistent reads (which we have) to accurately update the cooridinatedUpdateStates
map correctly.
> 
> Maxim Khutornenko wrote:
>     An update blocked (not PAUSED) due to a missed pulse can be unblocked by a new pulse.
This covers a few important design desisions:
>     - An update can be created blocked by default awaiting for the first pulse to start
its progress;
>     - An occasional network partition/delay will not require an explicit external service
operation to resume;
>     - A scheduler restart is treated the same as initial update creation - an update
is rehydrated and waits for a pulse to resume;
>     
>     More details and scenarios here: https://github.com/maxim111333/incubator-aurora/blob/hb_doc/docs/update-heartbeat.md

How do we show to the user (via client output or UI) that the update is currently blocked?


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review70058
-----------------------------------------------------------


On Jan. 23, 2015, 8:37 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Jan. 23, 2015, 8:37 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates
get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs`
expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is
a state machine change, which is much hairier and will require more changes in the JobUpdaterController.
Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2

>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd

>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173

>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 4c827b183a87b4d97774edbfaa960bd1c3de72a5

>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987

> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message