aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Farner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-279) Allow scheduler to decide how to respond to task health check failures
Date Tue, 27 Oct 2015 20:24:27 GMT

    [ https://issues.apache.org/jira/browse/AURORA-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977085#comment-14977085
] 

Bill Farner commented on AURORA-279:
------------------------------------

A concrete scenario is when instances of a service have a synchronized GC (on the JVM) that
causes the executor to think the local instance is unhealthy.  In that scenario, killing all
instances simultaneously is definitely worse than leaving it alone.  Of course, there's a
decent amount of engineering necessary to solve a relatively rare problem.

> Allow scheduler to decide how to respond to task health check failures
> ----------------------------------------------------------------------
>
>                 Key: AURORA-279
>                 URL: https://issues.apache.org/jira/browse/AURORA-279
>             Project: Aurora
>          Issue Type: Story
>          Components: Executor, Scheduler
>            Reporter: Bill Farner
>            Priority: Minor
>
> The executor is currently autonomous in deciding to kill tasks that have failed health
checks.  If health check failures synchronize across a service, the service could suffer an
outage.  SLA considerations may also need to be me made before deciding to kill a task for
health check failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message