aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McLaughlin <da...@dmclaughlin.com>
Subject Re: Review Request 50168: Add rollback functionality to the scheduler
Date Thu, 28 Jul 2016 23:11:24 GMT


> On July 28, 2016, 8:44 p.m., David McLaughlin wrote:
> > For me ROLLED_BACK has a clear meaning - a job that was rolled back due to a failed
health check in the update process. Overloading like this is going to lead to ambiguity and
I wouldn't be comfortable exposing this functionality to users. 
> > 
> > Further to that, having to reinsert the code lock is basically a code smell that
this approach is not what the current code was designed for. In fact this approach should
have failed the unit tests because ROLLED_FORWARD (or any other terminal state) to ROLLING_BACK
isn't an allowed state transition into the JobUpdateStateMachine: (https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java#L56).
> > 
> > It looks like a bug that only one of the changeUpdateStatus methods contains the
transition check:
> > 
> > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java#L404
> > 
> > I'm concerned that moving a previously terminal state into ROLLING_BACK creates
a bunch of conceptual issues and complexity that will make the codebase (even) harder to understand.
For example, throughout the code (also the UI code) we designate some states as terminal (https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java#L416)
- but in the future we'll have to keep in mind that those states are TERMINAL_BUT_NOT_REALLY.
ROLLED_BACK becomes the only true terminal state in this model. 
> > 
> > I'm not against reusing the rollback functionality built into the scheduler in the
way you're doing, but maybe we should introduce a new STATE to reflect that it was user-initiated
and add it to ALLOWED_TRANSITIONS? I'm not really familiar with this code, so maybe Bill or
Maxim could speak to an approach (or dismiss my concerns ;)).
> 
> Igor Morozov wrote:
>     Would not creating a new state imply the same level of conceptual complexity? You
would need to keep in mind not only to properly handle terminal states but also a new rollbackable
one? May be terminal states are not being useful as a conocept as they are being used only
here https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java#L451
where update job locks being cleand up and it sounds like it should be refactored away (see
Maxim's comment). So what I'm trying to say it would not be too much of a stratch to extend
the semantic of rollbacks and terminal states for update jobs.

They are only used there in the Scheduler, yes. But for things like hooks and other processes
around update lifecycle, the idea that an update can "come back to life" breaks a lot of assumptions.
The use of pulse in your diff is a great example. How is the service that was sending pulses
for the original update supposed to know the update was revived? The service we use to send
pulses needs some signal for when it can stop otherwise it would need to spam forever - it
relies on the terminal states to do that. I see similar problems with the hooks in the CLI
too. 

Again, maybe this is too specific to our tooling and others might disagree with my concerns.
We can always just remove this feature from our CLI.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50168/#review144001
-----------------------------------------------------------


On July 28, 2016, 10:54 p.m., Igor Morozov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50168/
> -----------------------------------------------------------
> 
> (Updated July 28, 2016, 10:54 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-1721
>     https://issues.apache.org/jira/browse/AURORA-1721
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Add rollback functionality to the scheduler
> 
> Thic change also includes an addition to storage replicated log. I needed a way to re-insert
lock token for an existing job update.
> 
> 
> Diffs
> -----
> 
>   CHANGELOG fc6a46d77ebf889c8f60402c95e8a7472980ba1c 
>   RELEASE-NOTES.md 29d224d5596eb060356f1343cf347db79b1ad687 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 1d66208490aff6ea8af4c737845fa2cf13617529

>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 9e4213f13255a182df938bea44ca87fa03a25318

>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 52c3c6618a3cf1009435ca8a9cece36365913e55

>   src/main/java/org/apache/aurora/scheduler/storage/db/DbJobUpdateStore.java d2673e6b328cb1e249fbe91d18e0d9e935636eaa

>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 39924c62108f6a68f545f90d77ceab4265d41fad

>   src/main/java/org/apache/aurora/scheduler/storage/log/WriteAheadStorage.java d0de063fd78e6c4f62fae4a598d1d22f9775772d

>   src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java b534abf95bab6e1657e3ef993cf34c0d6ec460be

>   src/main/java/org/apache/aurora/scheduler/thrift/aop/AnnotatedAuroraAdmin.java 9243c92b11040b68ed6014b3991db69fc08bcddf

>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java f8357c46df1b025bf4e38a7ce1cb1c13a50c39f9

>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java 364c5c753f884a2d89e27802d7bbf3b2b6d3a08e

>   src/main/python/apache/aurora/client/api/__init__.py 68baf8fdb90cd26100159401c46c9963c24332b3

>   src/main/python/apache/aurora/client/cli/update.py bb526f7bf94d7bfe02fe2786493c85be1bfeb86f

>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java e157c0dfde5efc418448e138aa008ade742fe816

>   src/test/python/apache/aurora/client/api/test_scheduler_client.py afbd385b7eda64cb1f7d118b695e65e4045eac6c

> 
> Diff: https://reviews.apache.org/r/50168/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Igor Morozov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message