Return-Path: X-Original-To: apmail-incubator-hama-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-hama-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC08D6BF6 for ; Tue, 24 May 2011 10:19:28 +0000 (UTC) Received: (qmail 47504 invoked by uid 500); 24 May 2011 10:19:28 -0000 Delivered-To: apmail-incubator-hama-dev-archive@incubator.apache.org Received: (qmail 47483 invoked by uid 500); 24 May 2011 10:19:28 -0000 Mailing-List: contact hama-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hama-dev@incubator.apache.org Delivered-To: mailing list hama-dev@incubator.apache.org Received: (qmail 47474 invoked by uid 99); 24 May 2011 10:19:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 May 2011 10:19:28 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 May 2011 10:19:27 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 54D23DB071 for ; Tue, 24 May 2011 10:18:47 +0000 (UTC) Date: Tue, 24 May 2011 10:18:47 +0000 (UTC) From: "Thomas Jungblut (JIRA)" To: hama-dev@incubator.apache.org Message-ID: <1083787256.38807.1306232327328.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1217210581.8823.1305248867332.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Issue Comment Edited] (HAMA-387) Add task ID and superstep count informations to lock file MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HAMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038473#comment-13038473 ] Thomas Jungblut edited comment on HAMA-387 at 5/24/11 10:16 AM: ---------------------------------------------------------------- Won't work. {noformat} java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /bsp/1_thomas-desktop:56492 {noformat} ZooKeeper sucks?:D EDIT: We actually have to set the superstep count into the byte value of this lock. Then we have to get the object and deserialize it then to check in which superstep the node is... {noformat} private int countGroomsInSuperStep(List list, long superStep) throws KeeperException, InterruptedException{ int count = 0; for(String groom : list){ byte[] data = zk.getData(bspRoot + "/" + groom, null, null); if(Bytes.toLong(data) == superStep) count++; } return count; } {noformat} And the loop is then going like: {noformat} while (true) { synchronized (mutex) { List list = zk.getChildren(bspRoot, true); if (countGroomsInSuperStep(list,this.getSuperstepCount()) > 0) { mutex.wait(); } else { LOG.debug("[" + getPeerName() + "] leave from the leaveBarrier"); return true; } } } {noformat} was (Author: thomas.jungblut): Won't work. {noformat} java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /bsp/1_thomas-desktop:56492 {noformat} ZooKeeper sucks?:D EDIT: We actually have to set the superstep count into the byte value of this lock. Then we have to get the object and deserialize it then to check in which superstep the node is... This is really crappy, we should open a feature ticket on the ZK project. > Add task ID and superstep count informations to lock file > --------------------------------------------------------- > > Key: HAMA-387 > URL: https://issues.apache.org/jira/browse/HAMA-387 > Project: Hama > Issue Type: Improvement > Components: bsp > Affects Versions: 0.2.0 > Reporter: Edward J. Yoon > Fix For: 0.3.0 > > Attachments: sleepless.patch > > > I think, the lock file must include: > * the job ID > * the task ID of the lock file owner > * the current superstep count > to check ownership and validation. > Currently they are named by hostname, but multi-tasks can be run per one groomserver in the future. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira