aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David McLaughlin" <da...@dmclaughlin.com>
Subject Re: Review Request 30225: Modifying update controller to support heartbeats.
Date Wed, 28 Jan 2015 19:42:33 GMT


> On Jan. 27, 2015, 5:49 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
line 227
> > <https://reviews.apache.org/r/30225/diff/1/?file=832014#file832014line227>
> >
> >     Given how sensitive we are to storage lock contention, is there any work in
this block that can be done outside of the lock? From a casual glance the only potentially
time-consuming (non-write) operation is the fetch from the job update store, but I'm not sure
how intensive an operation that will be (or the impact of performing it outside of the context
of a lock).
> 
> Maxim Khutornenko wrote:
>     Moving a fetch call out of the write transaction would create a potential for a race
that may compromise the feature and create hard to reason/trace corner cases. This is not
worth the arguable gain in contention improvement in my opinion.
> 
> David McLaughlin wrote:
>     AFAICT the worst race condition is sending a heartbeat for a job that is already
complete. Are there other race conditions we are protecting against? 
>     
>     Also, I don't see the value in persisting the heartbeats. What is the rationale for
this?
> 
> Maxim Khutornenko wrote:
>     "AFAICT the worst race condition is sending a heartbeat for a job that is already
complete. Are there other race conditions we are protecting against?" - anything inside the
evaluation logic is sensitive to the data provided in the details pull. This ranges from an
invalid "OK" request sent back for a completed update to an internal updater failure due to
outdated expectations. This pattern of querying inside of a write transaction is common in
the updater (e.g. look at changeJobUpdateStatus() called any time an update is evaluated)
and I'd rather not second-guess the correctness of the split.
>     
>     "Also, I don't see the value in persisting the heartbeats. What is the rationale
for this?" - what makes you think they are persisted? The heartbeats are stored in-memory
only (look for coordinatedUpdateStates map).

Thanks for explaining, the presence of write lock threw me off and I misread the code.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review69834
-----------------------------------------------------------


On Jan. 23, 2015, 8:37 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Jan. 23, 2015, 8:37 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated updates
get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs`
expires or the update reaches a terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the alternative is
a state machine change, which is much hairier and will require more changes in the JobUpdaterController.
Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2

>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd

>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173

>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 4c827b183a87b4d97774edbfaa960bd1c3de72a5

>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987

> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message