Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 065F410260 for ; Tue, 11 Feb 2014 23:04:25 +0000 (UTC) Received: (qmail 57946 invoked by uid 500); 11 Feb 2014 23:04:22 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 57880 invoked by uid 500); 11 Feb 2014 23:04:20 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 57859 invoked by uid 500); 11 Feb 2014 23:04:20 -0000 Delivered-To: apmail-incubator-giraph-dev@incubator.apache.org Received: (qmail 57852 invoked by uid 99); 11 Feb 2014 23:04:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Feb 2014 23:04:20 +0000 Date: Tue, 11 Feb 2014 23:04:20 +0000 (UTC) From: "Roman Shaposhnik (JIRA)" To: giraph-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (GIRAPH-747) BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers to complete MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/GIRAPH-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898460#comment-13898460 ] Roman Shaposhnik commented on GIRAPH-747: ----------------------------------------- [~initialcontext] any chance we can fix this for 1.1.0? I guess you're the resident Giraph-on-YARN expert ;-) > BspServiceMaster finishes ZooKeeper cleanup without waiting for all workers to complete > --------------------------------------------------------------------------------------- > > Key: GIRAPH-747 > URL: https://issues.apache.org/jira/browse/GIRAPH-747 > Project: Giraph > Issue Type: Bug > Affects Versions: 1.0.0 > Reporter: Chuan Lei > Assignee: Chuan Lei > Fix For: 1.1.0 > > Attachments: GIRAPH-747.v1.patch > > > In BspServiceMaster, the function cleanUpZooKeeper should wait for the number of workers and masters to complete. However, it appears that maxTasks only takes workers into consideration. Consequently, the worker straggler may fail to report to the ZooKeeper due to the path gets removed too early. This will cause No lease on path File does not exist exception at runtime. -- This message was sent by Atlassian JIRA (v6.1.5#6160)