aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jordan Ly (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AURORA-1932) Failure accrual detection mechanism for bad agents
Date Mon, 05 Jun 2017 16:48:04 GMT

     [ https://issues.apache.org/jira/browse/AURORA-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jordan Ly updated AURORA-1932:
------------------------------
    Issue Type: Task  (was: Story)

> Failure accrual detection mechanism for bad agents
> --------------------------------------------------
>
>                 Key: AURORA-1932
>                 URL: https://issues.apache.org/jira/browse/AURORA-1932
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: Jordan Ly
>            Assignee: Jordan Ly
>
> With the introduction of different OfferManager orderings (see https://reviews.apache.org/r/59480/),
we run the risk of repeatedly assigning the same task to a bad agent.
> We should develop some sort of 'failure accrual' mechanism where we can track how many
times tasks fail on a agent. If it reaches some sort of threshold, we should blacklist that
agent for some time so that it can be investigated and the task can be assigned to a different
agent.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message