aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Howell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-470) Tasks get stuck in THROTTLED state on restart or leader change
Date Fri, 23 May 2014 23:18:04 GMT

    [ https://issues.apache.org/jira/browse/AURORA-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007857#comment-14007857
] 

Nathan Howell commented on AURORA-470:
--------------------------------------

It's from an older build, but I didn't see any obviously related changes or tickets. I turned
down the flapping interval to 10 seconds and started up a service that exits after about 10
seconds.

This is on 7db986e53c74e87ec368e395af55300d1711d261 from late March, I couldn't get a trivial
example to repro on rc0 but haven't tried one with master failover.

{code}
I0523 20:51:30.002 THREAD18 com.twitter.common.util.StateMachine$Builder$1.execute: SchedulerLifecycle
state machine transition STORAGE_PREPARED -> LEADER_AWAITING_REGISTRATION
I0523 20:51:30.002 THREAD18 org.apache.aurora.scheduler.SchedulerLifecycle$6.execute: Elected
as leading scheduler!
...
0523 20:53:17.968 THREAD165 org.apache.aurora.scheduler.MesosSchedulerImpl.statusUpdate: Received
status update for task 1400878323661-xxx-0-f11c6fbf-7fe5-4c89-8005-534909443e19 in state TASK_FINISHED
with core message Task finished.
I0523 20:53:17.981 THREAD165 com.twitter.common.util.StateMachine$Builder$1.execute: 1400878323661-xxx-0-f11c6fbf-7fe5-4c89-8005-534909443e19
state machine transition RUNNING -> FINISHED
I0523 20:53:17.981 THREAD165 org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup:
Adding work command RESCHEDULE for 1400878323661-xxx-0-f11c6fbf-7fe5-4c89-8005-534909443e19
I0523 20:53:17.981 THREAD165 org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup:
Adding work command SAVE_STATE for 1400878323661-xxx-0-f11c6fbf-7fe5-4c89-8005-534909443e19
I0523 20:53:17.982 THREAD165 org.apache.aurora.scheduler.state.StateManagerImpl$7.apply: Task
being rescheduled: 1400878323661-xxx-0-f11c6fbf-7fe5-4c89-8005-534909443e19
I0523 20:53:17.982 THREAD165 org.apache.aurora.scheduler.async.RescheduleCalculator$RescheduleCalculatorImpl.getFlappingPenaltyMs:
Ancestor of 1400878323661-xxx-0-f11c6fbf-7fe5-4c89-8005-534909443e19 flapped: 1400878228688-xxx-0-01d4c232-981a-455f-b6d3-43559f1af22a
I0523 20:53:17.982 THREAD165 com.twitter.common.util.StateMachine$Builder$1.execute: 1400878397982-xxx-0-58777fe5-9eef-4a46-a123-8f240169ea86
state machine transition INIT -> THROTTLED
I0523 20:53:17.983 THREAD165 org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup:
Adding work command SAVE_STATE for 1400878397982-xxx-0-58777fe5-9eef-4a46-a123-8f240169ea86
{code}

!http://i.imgur.com/2FWEPdH.png!

> Tasks get stuck in THROTTLED state on restart or leader change
> --------------------------------------------------------------
>
>                 Key: AURORA-470
>                 URL: https://issues.apache.org/jira/browse/AURORA-470
>             Project: Aurora
>          Issue Type: Story
>          Components: Scheduler
>    Affects Versions: 0.5.0
>            Reporter: Nathan Howell
>
> We're seeing cases where tasks get stuck in the THROTTLED state indefinitely. From what
I can tell from the logs, this happens if a task is throttled when Aurora is shutdown or a
new leader is elected.
> It looks like the timer that changes the state from THROTTLED to PENDING is only setup
on a transition to the THROTTLED state... it seems like there  is no way to get these tasks
running again except to restart them manually.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message