Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 11839 invoked from network); 8 Mar 2011 21:13:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Mar 2011 21:13:35 -0000 Received: (qmail 23748 invoked by uid 500); 8 Mar 2011 21:13:34 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 23710 invoked by uid 500); 8 Mar 2011 21:13:34 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 23678 invoked by uid 99); 8 Mar 2011 21:13:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Mar 2011 21:13:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of semihsalihoglu@gmail.com designates 74.125.82.42 as permitted sender) Received: from [74.125.82.42] (HELO mail-ww0-f42.google.com) (74.125.82.42) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Mar 2011 21:13:27 +0000 Received: by wwi17 with SMTP id 17so1554966wwi.3 for ; Tue, 08 Mar 2011 13:13:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=vx17f7RQGULLBA776KL1ppqHUrn+6DVUX55kp9rTMZI=; b=LXLlIlO/V9geLMWrKxTYVtKuJXbZw2+yAFP/4t3KVFux8Hc/V1eaB7kSOG7InVn8EO D22M/R12hWPsyTe2zPyuOgMHG6JaIDDJ04x0rEn7hmfG1VnBGO5ONQ39vQiPlA1vRy6X rw7H9yZCspdQM3/KGaDX5ld5NfOdO8ro7IIeQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=lPVAsDmwUazyXMmFj2iUIVtpwHAcZbEyE6nj/VlakQqRS/eCAHCgCheq0lZdSGy9G5 GpKzB0qV5mwNQihCMt+CzUk83LaQw8WigrBgjPqQTg1PPMZ1kLYwPS0HMSUK6bp59hbT HTi+Ch6WNZvQ9vn3o9jX6S6C5/pAj7yPu1J48= MIME-Version: 1.0 Received: by 10.216.190.131 with SMTP id e3mr4665660wen.76.1299618786983; Tue, 08 Mar 2011 13:13:06 -0800 (PST) Sender: semihsalihoglu@gmail.com Received: by 10.216.88.7 with HTTP; Tue, 8 Mar 2011 13:13:06 -0800 (PST) In-Reply-To: References: <57D20ACE-9BDA-4DBC-814B-F8B833F0CB75@yahoo-inc.com> Date: Tue, 8 Mar 2011 13:13:06 -0800 X-Google-Sender-Auth: RbAiJE2irP7LE7zHSHsAQyBDCJc Message-ID: Subject: Re: Question about the Barrier Java example on the ZooKeeper documentation From: Semih Salihoglu To: user@zookeeper.apache.org Cc: Patrick Hunt , Flavio Junqueira Content-Type: multipart/alternative; boundary=0016e65a078c24414b049dff1456 X-Virus-Checked: Checked by ClamAV on apache.org --0016e65a078c24414b049dff1456 Content-Type: text/plain; charset=ISO-8859-1 Sure, I'll get to it this weekend probably. I don't know what jira is so some information of how to do this would be very helpful. Thank you, semih On Tue, Mar 8, 2011 at 8:31 AM, Patrick Hunt wrote: > On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira wrote: > >> I believe the goal of the examples was never to be a complete solutions to >> barriers or queues, but just to give a quick bootstrap to beginners. It is >> true, though, that the documentation page does not make that claim, and can >> be misleading. >> >> I see two possible action points out of this discussion: >> 1- State clearly in the beginning that the example discussed is not >> correct under the assumption that a process may finish the computation >> before another has started, and the example is there for illustration >> purposes; >> 2- Have another example following the current one that discusses the >> problem and shows how to fix it. This is an interesting option that >> illustrates how one could reason about a solution when developing with >> zookeeper. >> >> > This (2) sounds much better to me. Semih, would you like to give that a > try? (updating the docs I mean) > > Patrick > > >> If you are interested in helping us fix it, Semih, then you could perhaps >> create a jira and assign yourself to fix it. I can help you out. >> >> -Flavio >> >> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote: >> >> Hi Mahadev, >> >> Sorry for the late response. I agree, actually in this other documentation >> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where there >> is >> only the pseudo-code, I think this situation is avoided. Here there is >> another znode /ready that all nodes have a watch on. And after each node >> writes their own ephemeral child, they don't wait. They read how many of >> has >> been written and the last one writes the /ready znode and everyone wakes >> up. >> The only race condition in this one is that there can be two nodes trying >> to >> write /ready and only one of them will succeed but this is ok. >> >> Thank you again, >> >> semih >> >> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar wrote: >> >> Semih, >> >> You pointed it out right. It is possible ot enter into a situation >> >> like that. The recipe does have a bug. It can be fixed with the last >> >> client creating a special znode and every node in the list watching >> >> for that (so itll be an indication for entering the barrier). no? >> >> >> thanks >> >> mahadev >> >> >> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu >> >> wrote: >> >> Hi All, >> >> >> I am new to this group and to ZooKeeper. I was readin the Barrier >> >> tutorial >> >> in one of the ZooKeeper documentations. >> >> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html . >> >> A >> >> barrier primitive is exactly how I want to use ZooKeeper. I have a >> >> question >> >> about this example. It's not really a ZooKeeper question, it's more a >> >> question about the Barrier primitive I think. Here it is: In the enter >> >> method of this Barrier implementation below >> >> >> boolean enter() throws KeeperException, InterruptedException{ >> >> zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE, >> >> CreateMode.EPHEMERAL_SEQUENTIAL); >> >> while (true) { >> >> synchronized (mutex) { >> >> List list = zk.getChildren(root, true); >> >> >> if (list.size() < size) { >> >> mutex.wait(); >> >> } else { >> >> return true; >> >> } >> >> } >> >> } >> >> } >> >> >> could there be a race condition? Let's say there are two >> >> machines/nodes: node1 and node2 that will use this code to synchronize >> >> over ZK. Let's say the following steps take place: >> >> >> >> 1. node1 calls the zk.create method and then reads the number of >> >> children, and sees that it's 1 and starts waiting. >> >> 2. node2 calls the zk.create method (doesn't call the >> >> zk.getChildren method yet, let's say it's very slow) >> >> 3. node1 is notified that the number of children on the znode >> >> changed, it checks that the size is 2 so it leaves the barrier, it >> >> does its work and then leaves the barrier, deleting its node. >> >> 4. node2 calls zk.getChildren and because node1 has already left, >> >> it sees that the number of children is equal to 1. Since node1 will >> >> never enter the barrier again, it will keep waiting. >> >> >> Could this scenario happen? If not, what is preventing this? I haven't >> >> copied the code piece that enters barrier-does work-leaves barrier. >> >> But in the link I pasted above, it's the barrierTest(String args[]) >> >> method. >> >> >> Thank you very much in advance, >> >> >> semih >> >> >> >> >> *flavio* >> *junqueira* >> >> research scientist >> >> fpj@yahoo-inc.com >> direct +34 93-183-8828 >> >> avinguda diagonal 177, 8th floor, barcelona, 08018, es >> phone (408) 349 3300 fax (408) 349 3301 >> >> >> > --0016e65a078c24414b049dff1456--