aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Khutornenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1404) Reconcile ASSIGNED tasks that have not transitioned to STARTING
Date Mon, 27 Jul 2015 18:44:05 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643199#comment-14643199
] 

Maxim Khutornenko commented on AURORA-1404:
-------------------------------------------

The response time for stuck ASSIGNED tasks can be improved via AURORA-1370. I think it's generally
more robust to kill/reschedule an ASSIGNED task instead of retrying a {{launchTasks}} call
for something that's already in-flight.

> Reconcile ASSIGNED tasks that have not transitioned to STARTING
> ---------------------------------------------------------------
>
>                 Key: AURORA-1404
>                 URL: https://issues.apache.org/jira/browse/AURORA-1404
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: Joshua Cohen
>
> If the Mesos master fails over between the time that Aurora moves a task to {{ASSIGNED}}
but before the slave receives the message, those tasks will never transition and eventually
be timed out by [TaskTimeout|https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/TaskTimeout.java].
> Instead it would be better if we had a mechanism similar to [KillRetry|https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/KillRetry.java]
that ensures assigned tasks have transitioned to a running state, and if not transitions them
to {{LOST}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message