aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe Smith (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AURORA-1514) Allow users to give guidance on SLA for their job
Date Thu, 08 Oct 2015 13:49:27 GMT
Joe Smith created AURORA-1514:
---------------------------------

             Summary: Allow users to give guidance on SLA for their job
                 Key: AURORA-1514
                 URL: https://issues.apache.org/jira/browse/AURORA-1514
             Project: Aurora
          Issue Type: Story
          Components: Maintenance, SRE
            Reporter: Joe Smith


There needs to be a standard process for customizing the SLA used to validate a task on a
host can be killed to drain that host into maintenance. Right now, the default is [95% over
30minutes|https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/admin/admin_util.py#L35],
but there are certain services (such as memcache) which would be able to survive much better
under a 99% over 5 minutes, for example.

We could build this tooling [around the existing {{aurora_admin drain_hosts}}|https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/admin/admin_util.py#L75],
but it would apply to all tasks on that host, which would increase complexity.

Lastly, in case we decide to make this user-settable vs. operator-whitelistable.. t is important
that we still set firm barriers in place around acceptable values to prevent a service from
setting 100% over 0 minutes and holding hosts hostage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message