Return-Path: X-Original-To: apmail-aurora-issues-archive@minotaur.apache.org Delivered-To: apmail-aurora-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A64D910FF9 for ; Fri, 28 Feb 2014 01:13:52 +0000 (UTC) Received: (qmail 51267 invoked by uid 500); 28 Feb 2014 01:13:51 -0000 Delivered-To: apmail-aurora-issues-archive@aurora.apache.org Received: (qmail 51241 invoked by uid 500); 28 Feb 2014 01:13:51 -0000 Mailing-List: contact issues-help@aurora.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.incubator.apache.org Delivered-To: mailing list issues@aurora.incubator.apache.org Received: (qmail 51234 invoked by uid 99); 28 Feb 2014 01:13:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Feb 2014 01:13:51 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 28 Feb 2014 01:13:50 +0000 Received: (qmail 48095 invoked by uid 99); 28 Feb 2014 01:13:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Feb 2014 01:13:21 +0000 Date: Fri, 28 Feb 2014 01:13:21 +0000 (UTC) From: "Kevin Sweeney (JIRA)" To: issues@aurora.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (AURORA-224) Make health checking more configurable in updater MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/AURORA-224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Sweeney reassigned AURORA-224: ------------------------------------ Assignee: Kevin Sweeney > Make health checking more configurable in updater > ------------------------------------------------- > > Key: AURORA-224 > URL: https://issues.apache.org/jira/browse/AURORA-224 > Project: Aurora > Issue Type: Story > Components: Client > Reporter: Kevin Sweeney > Assignee: Kevin Sweeney > > Right now the updater considers an instance that passed its health check once but later fails as unconditionally failed [1] and restarts it. During startup a service could conceivably respond affirmatively to /health and then later timeout its requests. Consider making the behavior of the HTTP health checker more configurable during updates. > [1] https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/api/instance_watcher.py#L91 > {code} > def maybe_set_instance_unhealthy(instance_id, retriable): > # An instance that was previously healthy and currently unhealthy has failed. > if instance_id in instance_states: > log.info('Instance %s is unhealthy' % instance_id) > instance_states[instance_id].set_healthy(False) > # If the restart threshold has expired or if the instance cannot be retried it is unhealthy. > elif now > expected_healthy_by or not retriable: > log.info('Instance %s was not reported healthy within %d seconds' % ( > instance_id, self._restart_threshold)) > instance_states[instance_id] = Instance(finished=True) > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)