Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B6B19200C2B for ; Thu, 16 Feb 2017 03:24:50 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id B54A5160B70; Thu, 16 Feb 2017 02:24:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0A742160B5E for ; Thu, 16 Feb 2017 03:24:49 +0100 (CET) Received: (qmail 4229 invoked by uid 500); 16 Feb 2017 02:24:49 -0000 Mailing-List: contact reviews-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@aurora.apache.org Delivered-To: mailing list reviews@aurora.apache.org Received: (qmail 4207 invoked by uid 99); 16 Feb 2017 02:24:48 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Feb 2017 02:24:48 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id B7AC231F743; Thu, 16 Feb 2017 02:24:47 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============1494646418082315444==" MIME-Version: 1.0 Subject: Re: Review Request 56723: Add best effort pulse timestamp recovery. From: Zameer Manji To: Santhosh Kumar Shanmugham , David McLaughlin Cc: Aurora ReviewBot , Zameer Manji , Aurora Date: Thu, 16 Feb 2017 02:24:47 -0000 Message-ID: <20170216022447.31018.68802@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Zameer Manji X-ReviewGroup: Aurora X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/56723/ X-Sender: Zameer Manji References: <20170216020059.31018.73682@reviews.apache.org> In-Reply-To: <20170216020059.31018.73682@reviews.apache.org> Reply-To: Zameer Manji X-ReviewRequest-Repository: aurora archived-at: Thu, 16 Feb 2017 02:24:50 -0000 --===============1494646418082315444== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/56723/ ----------------------------------------------------------- (Updated Feb. 15, 2017, 6:24 p.m.) Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham. Bugs: AURORA-1890 https://issues.apache.org/jira/browse/AURORA-1890 Repository: aurora Description ------- Currently the scheduler causes all coordinated ("pulsed") updates into ROLL_FORWARD_AWAITING_PULSE, or ROLL_BACK_AWAITING_PULSE on scheduler startup/recovery. This is because the last pulse timestamp is not durably stored and the timestamp of the last pulse is set to 0L (aka no pulse yet). In cases where the pulse timeout is larger and the failover is fast or frequent, this casues many updates to unnecessarily transition into a pulse related state until the next pulse. It is posible to avoid these uncessary transitons by traversing the job update events and initializing the last pulse timestamp to the last event if the last event was not a pulse event. Diffs (updated) ----- api/src/main/thrift/org/apache/aurora/gen/api.thrift efd4e534c4ad90862d7a9fae437ed724da3a34dc src/main/java/org/apache/aurora/scheduler/base/Jobs.java 49e5b2cfc0b84bb0e0c95cca375cd0503f9dcdb5 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java 729c1234a2e27f1e756ddfd6a4e5a04fa20bbd7f src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java ea0b89a232c2fc10f2183218b750bb0478d51a58 Diff: https://reviews.apache.org/r/56723/diff/ Testing ------- Thanks, Zameer Manji --===============1494646418082315444==--