aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lambert (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AURORA-1023) Releasing the update lock trips off scheduler updater
Date Tue, 20 Jan 2015 20:29:34 GMT

     [ https://issues.apache.org/jira/browse/AURORA-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Lambert updated AURORA-1023:
----------------------------------
    Story Points: 5

> Releasing the update lock trips off scheduler updater
> -----------------------------------------------------
>
>                 Key: AURORA-1023
>                 URL: https://issues.apache.org/jira/browse/AURORA-1023
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Maxim Khutornenko
>            Assignee: Bill Farner
>            Priority: Critical
>              Labels: twitter
>
> Here is the faulty sequence:
> - User starts a scheduler job update and pauses while it's still in progress
> - User runs "aurora job cancel-update" command thus releasing the update lock
> - User starts a new scheduler job update
> At this point, any attempt to abort or pause an active update results in the following
error [1]:
> {noformat}
> vagrant@vagrant-ubuntu-trusty-64:~$ aurora beta-update abort devcluster/www-data/prod/hello
>  INFO] Aborting update for: devcluster/www-data/prod/hello
> Failed to abort update due to error:
> 	expected one element but was: <JobUpdateSummary(updateId:4b7fdc14-428f-44e4-9261-908b606f47e2,
jobKey:JobKey(role:www-data, environment:prod, name:hello), user:UNSECURE, state:JobUpdateState(status:ROLLING_FORWARD,
createdTimestampMs:1421450382234, lastModifiedTimestampMs:1421450382234)), JobUpdateSummary(updateId:3c9c2fa2-8e51-4c13-8440-94364205a37b,
jobKey:JobKey(role:www-data, environment:prod, name:hello), user:UNSECURE, state:JobUpdateState(status:ROLL_FORWARD_PAUSED,
createdTimestampMs:1421450304935, lastModifiedTimestampMs:1421450324055))>
> {noformat}
> The only way to recover from this state is either wait for the active job update to reach
terminal state or force it to it by running another cancel-update.
> While the "cancel-update" will eventually go away with the client updater, we do have
a problem during the migration period. A possible (though ugly) short-term workaround could
be calling "abortJobUpdate" from the "releaseLock" RPC.
> [1] - https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java#L295-L296



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message