Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E7243200BCE for ; Fri, 2 Dec 2016 22:44:58 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E600B160B24; Fri, 2 Dec 2016 21:44:58 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3AAB5160B08 for ; Fri, 2 Dec 2016 22:44:58 +0100 (CET) Received: (qmail 55343 invoked by uid 500); 2 Dec 2016 21:44:57 -0000 Mailing-List: contact reviews-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@aurora.apache.org Delivered-To: mailing list reviews@aurora.apache.org Received: (qmail 55314 invoked by uid 99); 2 Dec 2016 21:44:56 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Dec 2016 21:44:56 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 7FAFA2D9213; Fri, 2 Dec 2016 21:44:56 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============1725824411399788809==" MIME-Version: 1.0 Subject: Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts. From: Joshua Cohen To: Joshua Cohen , David McLaughlin , Stephan Erb , Zameer Manji Cc: Santhosh Kumar Shanmugham , Aurora Date: Fri, 02 Dec 2016 21:44:56 -0000 Message-ID: <20161202214456.1642.11927@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Joshua Cohen X-ReviewGroup: Aurora X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/54299/ X-Sender: Joshua Cohen References: <20161202084349.1642.98328@reviews.apache.org> In-Reply-To: <20161202084349.1642.98328@reviews.apache.org> Reply-To: Joshua Cohen X-ReviewRequest-Repository: aurora archived-at: Fri, 02 Dec 2016 21:44:59 -0000 --===============1725824411399788809== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54299/#review157764 ----------------------------------------------------------- src/main/python/apache/aurora/executor/common/health_checker.py (line 113) s/suppose/supposed src/main/python/apache/aurora/executor/common/health_checker.py (lines 115 - 117) There still exists the chance for a backwards incompatibility here. Under the previous watch-driven updates, a task could flip between failing and successful health checks, and as long as it's still running at the end of `watch_secs` the updater would consider it healthy and move on. With this new behavior, someone could configure a task in such a way that the max attempts are consumed without reaching `max_consecutive_failures` or `min_consecutive_successes` before `watch_secs` is elapsed, meaning that the task would fail. As we discussed earlier, if we make `watch_secs` and `min_consecutive_successes` mutually exclusive in the client, then the executor could only trigger the new behavior if the user opted in by setting `watch_secs` to 0 and `min_consecutive_successes` to non-zero. - Joshua Cohen On Dec. 2, 2016, 8:43 a.m., Santhosh Kumar Shanmugham wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54299/ > ----------------------------------------------------------- > > (Updated Dec. 2, 2016, 8:43 a.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Bugs: AURORA-1841 > https://issues.apache.org/jira/browse/AURORA-1841 > > > Repository: aurora > > > Description > ------- > > It is possible to set the health checks such that a task can > continually fail health checks with intermittent successes and still > succeed an update. Essentially a task fails health checks during the > `initial_interval_secs` and an additional `max_consecutive_failures`, > and then perform a successful health check to become healthy. > > To be backward compatible to the above configuration, include the > `max_consecutive_failures` when computing `max_attempts_to_running`. > > > Diffs > ----- > > docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b > src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49 > src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 > > Diff: https://reviews.apache.org/r/54299/diff/ > > > Testing > ------- > > build-support/jenkins/build.sh > src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh > > > Thanks, > > Santhosh Kumar Shanmugham > > --===============1725824411399788809==--