aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Erb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1500) Platform SLA gets stuck in DOWN when a replacement PENDING is killed
Date Mon, 08 Feb 2016 22:44:39 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137911#comment-15137911
] 

Stephan Erb commented on AURORA-1500:
-------------------------------------

Relevant piece of code responsible for untracable deletes of PENDING tasks: https://github.com/apache/aurora/blob/9ed81a7db58f6a7cb308c8ac6a545705351c8c0e/src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java#L442
(thanks Maxim for pointing out :-)

> Platform SLA gets stuck in DOWN when a replacement PENDING is killed
> --------------------------------------------------------------------
>
>                 Key: AURORA-1500
>                 URL: https://issues.apache.org/jira/browse/AURORA-1500
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Maxim Khutornenko
>
> The way platform SLA calculation is currently done cannot account for some special cases
when killed tasks don't leave any history behind. One example: a task gets LOST (SLA DOWN
interval starts) and its replacement is scheduled immediately. If, however, the replacement
task gets killed while still in PENDING, no history is left to close the DOWN interval and
the platform SLA is degraded until either a new matching instance task is created by user
or the task history is purged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message