aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Sweeney (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AURORA-224) Make health checking more configurable in updater
Date Fri, 21 Feb 2014 00:59:21 GMT
Kevin Sweeney created AURORA-224:
------------------------------------

             Summary: Make health checking more configurable in updater
                 Key: AURORA-224
                 URL: https://issues.apache.org/jira/browse/AURORA-224
             Project: Aurora
          Issue Type: Story
          Components: Client
            Reporter: Kevin Sweeney


Right now the updater considers an instance that passed its health check once but later fails
as unconditionally failed [1] and restarts it. During startup a service could conceivably
respond affirmatively to /health and then later timeout its requests. Consider making the
behavior of the HTTP health checker more configurable during updates.

[1] https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/api/instance_watcher.py#L91
{code}
    def maybe_set_instance_unhealthy(instance_id, retriable):
      # An instance that was previously healthy and currently unhealthy has failed.
      if instance_id in instance_states:
        log.info('Instance %s is unhealthy' % instance_id)
        instance_states[instance_id].set_healthy(False)
      # If the restart threshold has expired or if the instance cannot be retried it is unhealthy.
      elif now > expected_healthy_by or not retriable:
        log.info('Instance %s was not reported healthy within %d seconds' % (
          instance_id, self._restart_threshold))
        instance_states[instance_id] = Instance(finished=True)
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message