Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 90220 invoked from network); 9 Mar 2011 16:11:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Mar 2011 16:11:47 -0000 Received: (qmail 23615 invoked by uid 500); 9 Mar 2011 16:11:47 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 23595 invoked by uid 500); 9 Mar 2011 16:11:47 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 23587 invoked by uid 99); 9 Mar 2011 16:11:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2011 16:11:47 +0000 X-ASF-Spam-Status: No, hits=-1997.8 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.9] (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 09 Mar 2011 16:11:42 +0000 Received: (qmail 90175 invoked by uid 99); 9 Mar 2011 16:11:20 -0000 Received: from localhost.apache.org (HELO mail-wy0-f170.google.com) (127.0.0.1) (smtp-auth username mahadev, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2011 16:11:20 +0000 Received: by wyb34 with SMTP id 34so847575wyb.15 for ; Wed, 09 Mar 2011 08:11:17 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.18.76 with SMTP id k54mr4702173wek.61.1299687077699; Wed, 09 Mar 2011 08:11:17 -0800 (PST) Received: by 10.216.13.137 with HTTP; Wed, 9 Mar 2011 08:11:17 -0800 (PST) In-Reply-To: References: <57D20ACE-9BDA-4DBC-814B-F8B833F0CB75@yahoo-inc.com> Date: Wed, 9 Mar 2011 08:11:17 -0800 Message-ID: Subject: Re: Question about the Barrier Java example on the ZooKeeper documentation From: Mahadev Konar To: user@zookeeper.apache.org Cc: Semih Salihoglu , Flavio Junqueira , Patrick Hunt Content-Type: multipart/alternative; boundary=0016364d1ca595e241049e0efa50 X-Virus-Checked: Checked by ClamAV on apache.org --0016364d1ca595e241049e0efa50 Content-Type: text/plain; charset=ISO-8859-1 I just added you to the contributors list and assigned the jira to you. thanks mahadev On Wed, Mar 9, 2011 at 1:55 AM, Semih Salihoglu wrote: > I created a bug but I don't see a way to assign it to myself (or anyone > actually). Here's the link: > https://issues.apache.org/jira/browse/ZOOKEEPER-1011. > > semih > > > On Wed, Mar 9, 2011 at 1:30 AM, Flavio Junqueira wrote: > >> Hi Semih, Jira is the system we use to report and discuss zookeeper >> issues: >> >> https://issues.apache.org/jira/browse/ZOOKEEPER >> >> Once you have an account, you can create a new issue, describe it, and >> propose a fix to the problem at hand. >> >> -Flavio >> >> On Mar 8, 2011, at 10:13 PM, Semih Salihoglu wrote: >> >> Sure, I'll get to it this weekend probably. >> >> I don't know what jira is so some information of how to do this would be >> very helpful. >> >> Thank you, >> >> semih >> >> On Tue, Mar 8, 2011 at 8:31 AM, Patrick Hunt wrote: >> >>> On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira wrote: >>> >>>> I believe the goal of the examples was never to be a complete solutions >>>> to barriers or queues, but just to give a quick bootstrap to beginners. It >>>> is true, though, that the documentation page does not make that claim, and >>>> can be misleading. >>>> >>>> I see two possible action points out of this discussion: >>>> 1- State clearly in the beginning that the example discussed is not >>>> correct under the assumption that a process may finish the computation >>>> before another has started, and the example is there for illustration >>>> purposes; >>>> 2- Have another example following the current one that discusses the >>>> problem and shows how to fix it. This is an interesting option that >>>> illustrates how one could reason about a solution when developing with >>>> zookeeper. >>>> >>>> >>> This (2) sounds much better to me. Semih, would you like to give that a >>> try? (updating the docs I mean) >>> >>> Patrick >>> >>> >>>> If you are interested in helping us fix it, Semih, then you could >>>> perhaps create a jira and assign yourself to fix it. I can help you out. >>>> >>>> -Flavio >>>> >>>> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote: >>>> >>>> Hi Mahadev, >>>> >>>> Sorry for the late response. I agree, actually in this other >>>> documentation >>>> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where >>>> there is >>>> only the pseudo-code, I think this situation is avoided. Here there is >>>> another znode /ready that all nodes have a watch on. And after each node >>>> writes their own ephemeral child, they don't wait. They read how many of >>>> has >>>> been written and the last one writes the /ready znode and everyone wakes >>>> up. >>>> The only race condition in this one is that there can be two nodes >>>> trying to >>>> write /ready and only one of them will succeed but this is ok. >>>> >>>> Thank you again, >>>> >>>> semih >>>> >>>> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar >>>> wrote: >>>> >>>> Semih, >>>> >>>> You pointed it out right. It is possible ot enter into a situation >>>> >>>> like that. The recipe does have a bug. It can be fixed with the last >>>> >>>> client creating a special znode and every node in the list watching >>>> >>>> for that (so itll be an indication for entering the barrier). no? >>>> >>>> >>>> thanks >>>> >>>> mahadev >>>> >>>> >>>> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu >>>> >>>> wrote: >>>> >>>> Hi All, >>>> >>>> >>>> I am new to this group and to ZooKeeper. I was readin the Barrier >>>> >>>> tutorial >>>> >>>> in one of the ZooKeeper documentations. >>>> >>>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html. >>>> >>>> A >>>> >>>> barrier primitive is exactly how I want to use ZooKeeper. I have a >>>> >>>> question >>>> >>>> about this example. It's not really a ZooKeeper question, it's more a >>>> >>>> question about the Barrier primitive I think. Here it is: In the enter >>>> >>>> method of this Barrier implementation below >>>> >>>> >>>> boolean enter() throws KeeperException, InterruptedException{ >>>> >>>> zk.create(root + "/" + name, new byte[0], >>>> Ids.OPEN_ACL_UNSAFE, >>>> >>>> CreateMode.EPHEMERAL_SEQUENTIAL); >>>> >>>> while (true) { >>>> >>>> synchronized (mutex) { >>>> >>>> List list = zk.getChildren(root, true); >>>> >>>> >>>> if (list.size() < size) { >>>> >>>> mutex.wait(); >>>> >>>> } else { >>>> >>>> return true; >>>> >>>> } >>>> >>>> } >>>> >>>> } >>>> >>>> } >>>> >>>> >>>> could there be a race condition? Let's say there are two >>>> >>>> machines/nodes: node1 and node2 that will use this code to synchronize >>>> >>>> over ZK. Let's say the following steps take place: >>>> >>>> >>>> >>>> 1. node1 calls the zk.create method and then reads the number of >>>> >>>> children, and sees that it's 1 and starts waiting. >>>> >>>> 2. node2 calls the zk.create method (doesn't call the >>>> >>>> zk.getChildren method yet, let's say it's very slow) >>>> >>>> 3. node1 is notified that the number of children on the znode >>>> >>>> changed, it checks that the size is 2 so it leaves the barrier, it >>>> >>>> does its work and then leaves the barrier, deleting its node. >>>> >>>> 4. node2 calls zk.getChildren and because node1 has already left, >>>> >>>> it sees that the number of children is equal to 1. Since node1 will >>>> >>>> never enter the barrier again, it will keep waiting. >>>> >>>> >>>> Could this scenario happen? If not, what is preventing this? I haven't >>>> >>>> copied the code piece that enters barrier-does work-leaves barrier. >>>> >>>> But in the link I pasted above, it's the barrierTest(String args[]) >>>> >>>> method. >>>> >>>> >>>> Thank you very much in advance, >>>> >>>> >>>> semih >>>> >>>> >>>> >>>> >>>> *flavio* >>>> *junqueira* >>>> >>>> research scientist >>>> >>>> fpj@yahoo-inc.com >>>> direct +34 93-183-8828 >>>> >>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es >>>> phone (408) 349 3300 fax (408) 349 3301 >>>> >>>> >>>> >>> >> >> *flavio* >> *junqueira* >> >> research scientist >> >> fpj@yahoo-inc.com >> direct +34 93-183-8828 >> >> avinguda diagonal 177, 8th floor, barcelona, 08018, es >> phone (408) 349 3300 fax (408) 349 3301 >> >> >> > --0016364d1ca595e241049e0efa50--