aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zameer Manji <zma...@apache.org>
Subject Re: Review Request 56723: Add best effort pulse timestamp recovery.
Date Thu, 16 Feb 2017 01:53:04 GMT


> On Feb. 15, 2017, 2:32 p.m., David McLaughlin wrote:
> > A comment on the overall approach: in a healthy system, there is usually only one
transition from AWAITING_PULSE -> ROLLING_FORWARD. Meanwhile, there will have been regular
pulses that keep the update in ROLLING_FORWARD state. I'd recommend setting the last pulse
time to the latest event that happened - because that event is guaranteed to have happened
inside a healthy "pulsed" time interval.

Done.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56723/#review165775
-----------------------------------------------------------


On Feb. 15, 2017, 2:09 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56723/
> -----------------------------------------------------------
> 
> (Updated Feb. 15, 2017, 2:09 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
> 
> 
> Bugs: AURORA-1890
>     https://issues.apache.org/jira/browse/AURORA-1890
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Currently the scheduler causes all coordinated ("pulsed") updates into
> ROLL_FORWARD_AWAITING_PULSE, or ROLL_BACK_AWAITING_PULSE on scheduler
> startup/recovery. This is because the last pulse timestamp is not durably stored
> and the timestamp of the last pulse is set to 0L (aka no pulse yet).
> 
> In cases where the pulse timeout is larger and the failover is fast or frequent,
> this casues many updates to unnecessarily transition into a pulse related state
> until the next pulse.
> 
> It is posible to avoid these uncessary transitons by traversing the job update
> events and finding the last PULSE -> * state transition. The timestamp of the *
> event indicates that a pulse was recieved at that point in time and can be used
> to inititalize the pulse sate on startup.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift efd4e534c4ad90862d7a9fae437ed724da3a34dc

>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 49e5b2cfc0b84bb0e0c95cca375cd0503f9dcdb5

>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java 729c1234a2e27f1e756ddfd6a4e5a04fa20bbd7f

>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java ea0b89a232c2fc10f2183218b750bb0478d51a58

> 
> Diff: https://reviews.apache.org/r/56723/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message