Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0BC62200BE0 for ; Sat, 3 Dec 2016 00:54:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 0A4A3160B29; Fri, 2 Dec 2016 23:54:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 78122160B24 for ; Sat, 3 Dec 2016 00:53:59 +0100 (CET) Received: (qmail 81977 invoked by uid 500); 2 Dec 2016 23:53:58 -0000 Mailing-List: contact issues-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list issues@aurora.apache.org Received: (qmail 81956 invoked by uid 99); 2 Dec 2016 23:53:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Dec 2016 23:53:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7A73C2C1F56 for ; Fri, 2 Dec 2016 23:53:58 +0000 (UTC) Date: Fri, 2 Dec 2016 23:53:58 +0000 (UTC) From: "Santhosh Kumar Shanmugham (JIRA)" To: issues@aurora.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (AURORA-1844) Force a snapshot at the end of startup. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 02 Dec 2016 23:54:00 -0000 Santhosh Kumar Shanmugham created AURORA-1844: ------------------------------------------------- Summary: Force a snapshot at the end of startup. Key: AURORA-1844 URL: https://issues.apache.org/jira/browse/AURORA-1844 Project: Aurora Issue Type: Task Reporter: Santhosh Kumar Shanmugham Priority: Minor When the scheduler starts up, it replays the logs from the replicated log to catch up with the current state, before announcing itself as the leader to the outside world. If for any reason after this replay, the scheduler dies after adding more log entires, the next startup will have to redo the work again. This becomes problem when the amount of additional work added is not trivial, and can take the scheduler down the path of a spiraling death. One example, of this is when the TaskHistoryPruner cleans up the DB but adds to the log entires. In order to avoid the repeated work, the scheduler should force a snapshot after the initial replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332)