Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD1BC1183E for ; Wed, 24 Sep 2014 20:40:34 +0000 (UTC) Received: (qmail 55745 invoked by uid 500); 24 Sep 2014 20:40:34 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 55692 invoked by uid 500); 24 Sep 2014 20:40:34 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 55677 invoked by uid 500); 24 Sep 2014 20:40:34 -0000 Delivered-To: apmail-incubator-giraph-dev@incubator.apache.org Received: (qmail 55674 invoked by uid 99); 24 Sep 2014 20:40:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Sep 2014 20:40:34 +0000 Date: Wed, 24 Sep 2014 20:40:34 +0000 (UTC) From: "Sergey Edunov (JIRA)" To: giraph-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (GIRAPH-950) Auto-restart from checkpoint doesn't pick up latest checkpoint MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Sergey Edunov created GIRAPH-950: ------------------------------------ Summary: Auto-restart from checkpoint doesn't pick up latest checkpoint Key: GIRAPH-950 URL: https://issues.apache.org/jira/browse/GIRAPH-950 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov While running different jobs with checkpoints enabled I noticed some issues: 1) The way we pick up latest checkpoint is not correct. Current implementation just picks whatever is returned last from FileSystem.list(), which is not necessarily the last checkpoint 2) If job restarts from checkpoint it immediately creates another checkpoint. 3) We need more flexibility in GiraphJobRetryChecker to allow restarts after multiple failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)