Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 52594 invoked from network); 9 Mar 2011 09:56:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Mar 2011 09:56:12 -0000 Received: (qmail 32711 invoked by uid 500); 9 Mar 2011 09:56:12 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 32688 invoked by uid 500); 9 Mar 2011 09:56:12 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 32680 invoked by uid 99); 9 Mar 2011 09:56:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2011 09:56:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of semihsalihoglu@gmail.com designates 74.125.82.46 as permitted sender) Received: from [74.125.82.46] (HELO mail-ww0-f46.google.com) (74.125.82.46) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2011 09:56:07 +0000 Received: by wwb28 with SMTP id 28so370146wwb.15 for ; Wed, 09 Mar 2011 01:55:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=0FBDHUWildKkKx4Dja6QzN14DoB0vvZL3GKM4tWHlh8=; b=Vb4wklUiQiqrTpFkRfCmvosdP4ZH4CU+FmPkG72yrVkGDq/gYE+Wx1zoW283uwiyGq i1de8hpHNabwvch+5TJ++1spAZ5kRLILKGVpJQgdsfPSNZZ+++H82xlo5TIzRL1KQiI4 a23kfcLKmYFyvfI/jCAUssL2Szrfj+aVL3I2A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=pANayF+AOEKcdLgt5yGIvdpdTo1UiLxH8sEqWhY8KdIfuXFJhatY5G6vvad0hY4GUg Ul5El16xXgVpro0C3HW7qpxyjA7yxdTcFIlwsFo5ivYX37dNm0p6x9dd1RQ+X+YMys/B E3eWa5nnZBu/jgq7TTW6uhQJbUilV2t7DM6A8= MIME-Version: 1.0 Received: by 10.216.241.11 with SMTP id f11mr2524668wer.76.1299664544724; Wed, 09 Mar 2011 01:55:44 -0800 (PST) Sender: semihsalihoglu@gmail.com Received: by 10.216.88.7 with HTTP; Wed, 9 Mar 2011 01:55:44 -0800 (PST) In-Reply-To: References: <57D20ACE-9BDA-4DBC-814B-F8B833F0CB75@yahoo-inc.com> Date: Wed, 9 Mar 2011 01:55:44 -0800 X-Google-Sender-Auth: 5sfaU3CkG0wyGwTtrcq-zgrzEss Message-ID: Subject: Re: Question about the Barrier Java example on the ZooKeeper documentation From: Semih Salihoglu To: user@zookeeper.apache.org Cc: Flavio Junqueira , Patrick Hunt Content-Type: multipart/related; boundary=e0cb4e3856d483f921049e09bb7f --e0cb4e3856d483f921049e09bb7f Content-Type: multipart/alternative; boundary=e0cb4e3856d483f91a049e09bb7e --e0cb4e3856d483f91a049e09bb7e Content-Type: text/plain; charset=ISO-8859-1 I created a bug but I don't see a way to assign it to myself (or anyone actually). Here's the link: https://issues.apache.org/jira/browse/ZOOKEEPER-1011. semih On Wed, Mar 9, 2011 at 1:30 AM, Flavio Junqueira wrote: > Hi Semih, Jira is the system we use to report and discuss zookeeper issues: > > https://issues.apache.org/jira/browse/ZOOKEEPER > > Once you have an account, you can create a new issue, describe it, and > propose a fix to the problem at hand. > > -Flavio > > On Mar 8, 2011, at 10:13 PM, Semih Salihoglu wrote: > > Sure, I'll get to it this weekend probably. > > I don't know what jira is so some information of how to do this would be > very helpful. > > Thank you, > > semih > > On Tue, Mar 8, 2011 at 8:31 AM, Patrick Hunt wrote: > >> On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira wrote: >> >>> I believe the goal of the examples was never to be a complete solutions >>> to barriers or queues, but just to give a quick bootstrap to beginners. It >>> is true, though, that the documentation page does not make that claim, and >>> can be misleading. >>> >>> I see two possible action points out of this discussion: >>> 1- State clearly in the beginning that the example discussed is not >>> correct under the assumption that a process may finish the computation >>> before another has started, and the example is there for illustration >>> purposes; >>> 2- Have another example following the current one that discusses the >>> problem and shows how to fix it. This is an interesting option that >>> illustrates how one could reason about a solution when developing with >>> zookeeper. >>> >>> >> This (2) sounds much better to me. Semih, would you like to give that a >> try? (updating the docs I mean) >> >> Patrick >> >> >>> If you are interested in helping us fix it, Semih, then you could perhaps >>> create a jira and assign yourself to fix it. I can help you out. >>> >>> -Flavio >>> >>> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote: >>> >>> Hi Mahadev, >>> >>> Sorry for the late response. I agree, actually in this other >>> documentation >>> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where there >>> is >>> only the pseudo-code, I think this situation is avoided. Here there is >>> another znode /ready that all nodes have a watch on. And after each node >>> writes their own ephemeral child, they don't wait. They read how many of >>> has >>> been written and the last one writes the /ready znode and everyone wakes >>> up. >>> The only race condition in this one is that there can be two nodes trying >>> to >>> write /ready and only one of them will succeed but this is ok. >>> >>> Thank you again, >>> >>> semih >>> >>> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar >>> wrote: >>> >>> Semih, >>> >>> You pointed it out right. It is possible ot enter into a situation >>> >>> like that. The recipe does have a bug. It can be fixed with the last >>> >>> client creating a special znode and every node in the list watching >>> >>> for that (so itll be an indication for entering the barrier). no? >>> >>> >>> thanks >>> >>> mahadev >>> >>> >>> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu >>> >>> wrote: >>> >>> Hi All, >>> >>> >>> I am new to this group and to ZooKeeper. I was readin the Barrier >>> >>> tutorial >>> >>> in one of the ZooKeeper documentations. >>> >>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html . >>> >>> A >>> >>> barrier primitive is exactly how I want to use ZooKeeper. I have a >>> >>> question >>> >>> about this example. It's not really a ZooKeeper question, it's more a >>> >>> question about the Barrier primitive I think. Here it is: In the enter >>> >>> method of this Barrier implementation below >>> >>> >>> boolean enter() throws KeeperException, InterruptedException{ >>> >>> zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE, >>> >>> CreateMode.EPHEMERAL_SEQUENTIAL); >>> >>> while (true) { >>> >>> synchronized (mutex) { >>> >>> List list = zk.getChildren(root, true); >>> >>> >>> if (list.size() < size) { >>> >>> mutex.wait(); >>> >>> } else { >>> >>> return true; >>> >>> } >>> >>> } >>> >>> } >>> >>> } >>> >>> >>> could there be a race condition? Let's say there are two >>> >>> machines/nodes: node1 and node2 that will use this code to synchronize >>> >>> over ZK. Let's say the following steps take place: >>> >>> >>> >>> 1. node1 calls the zk.create method and then reads the number of >>> >>> children, and sees that it's 1 and starts waiting. >>> >>> 2. node2 calls the zk.create method (doesn't call the >>> >>> zk.getChildren method yet, let's say it's very slow) >>> >>> 3. node1 is notified that the number of children on the znode >>> >>> changed, it checks that the size is 2 so it leaves the barrier, it >>> >>> does its work and then leaves the barrier, deleting its node. >>> >>> 4. node2 calls zk.getChildren and because node1 has already left, >>> >>> it sees that the number of children is equal to 1. Since node1 will >>> >>> never enter the barrier again, it will keep waiting. >>> >>> >>> Could this scenario happen? If not, what is preventing this? I haven't >>> >>> copied the code piece that enters barrier-does work-leaves barrier. >>> >>> But in the link I pasted above, it's the barrierTest(String args[]) >>> >>> method. >>> >>> >>> Thank you very much in advance, >>> >>> >>> semih >>> >>> >>> >>> >>> *flavio* >>> *junqueira* >>> >>> research scientist >>> >>> fpj@yahoo-inc.com >>> direct +34 93-183-8828 >>> >>> avinguda diagonal 177, 8th floor, barcelona, 08018, es >>> phone (408) 349 3300 fax (408) 349 3301 >>> >>> >>> >> > > *flavio* > *junqueira* > > research scientist > > fpj@yahoo-inc.com > direct +34 93-183-8828 > > avinguda diagonal 177, 8th floor, barcelona, 08018, es > phone (408) 349 3300 fax (408) 349 3301 > > > --e0cb4e3856d483f91a049e09bb7e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I created a bug but I don't see a way to assign it to myself (or anyone= actually). Here's the link:=A0https://issues.apache.org/jira/browse/ZOOKEEPER-10= 11.

semih

On Wed, Mar 9, 2011 = at 1:30 AM, Flavio Junqueira <fpj@yahoo-inc.com> wrote:
Hi Semih, Jira is the system we use to = report and discuss zookeeper issues:


Once you have an account, you can create a new is= sue, describe it, and propose a fix to the problem at hand.

-Flavio

On Mar 8, 2011= , at 10:13 PM, Semih Salihoglu wrote:

Sure, I'll get to it this weekend probabl= y.

I don't know what jira is so some information of = how to do this would be very helpful.

Thank you,

semih

On Tue, Mar 8,= 2011 at 8:31 AM, Patrick Hunt <phunt@apache.org> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
On Tue, Mar 8, 2011 at 5:59 AM, Flavio Jun= queira <fpj@yahoo-inc.com> wrote:
I believe the goal of the examples was= never to be a complete solutions to barriers or queues, but just to give a= quick bootstrap to beginners. It is true, though, that the documentation p= age does not make that claim, and can be misleading.

I see two possible action points out of this discussion:
1- State clearly in the beginning that the= example discussed is not correct under the assumption that a process may f= inish the computation before another has started, and the example is there = for illustration purposes;
2- Have another example following the current one that discusses the = problem and shows how to fix it. This is an interesting option that illustr= ates how one could reason about a solution when developing with zookeeper.<= /div>


This (2) sound= s much better to me. Semih, would you like to give that a try? (updating th= e docs I mean)

Patrick
=A0
=
If you are interested i= n helping us fix it, Semih, then you could perhaps create a jira and assign= yourself to fix it. I can help you out.

-Flavio

=
On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote:
Hi Mahadev,

Sorry for the late respo= nse. I agree, actually in this other documentation
http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html= , where there is
only the pseudo-code, I think this situation is avo= ided. Here there is
another znode /ready that all nodes have a watch on. And after each nodewrites their own ephemeral child, they don't wait. They read how many= of has
been written and the last one writes the /ready znode and everyo= ne wakes up.
The only race condition in this one is that there can be two nodes trying = to
write /ready and only one of them will succeed but this is ok.
Thank you again,

semih

On Sat, Mar 5, 2011 at 6:41 PM, Mahad= ev Konar <mahade= v@apache.org> wrote:

Semih,
You pointed it out right. It is possible ot enter into a situation
like that. The recipe does have a b= ug. It can be fixed with the last
client creating a special znode and= every node in the list watching
= for that (so itll be an indication for entering the barrier). no?

thank= s
mahadev

On Sat, Mar 5= , 2011 at 5:06 PM, Semih Salihoglu <semih@stanford.edu>
wrote:
Hi All,

I am new to this group= and to ZooKeeper. I was readin the Barrier
tutorial
=
in one of the ZooKeeper documentations.
ht= tp://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html .<= br>
A
barrier primitive is exactl= y how I want to use ZooKeeper. I have a
question
about this example. It's not really a ZooKeeper question, it's m= ore a
question about the Barrier primitive I think. Here it is: In the enter
=
method of this Barrier implementation below

boolean enter()= throws KeeperException, InterruptedException{
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0zk.create(root + = "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0CreateMode.EPHEMERAL_SEQUENTIAL);=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0while (true) {
=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0synchronized (mutex) {
=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0List<String> list =3D= zk.getChildren(root, true);

=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0if (list.size() < size) = {
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0mutex.wait();
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0} else {
= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0return tr= ue;
=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0}
=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0}
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0}
=A0=A0= =A0=A0=A0=A0}

could there be a race = condition? Let's say there are two
machines/nodes: node1 and node= 2 that will use this code to synchronize
over ZK. Let's say the following steps take place:
=


= =A01. node1 calls the zk.create method and then reads the number of
children,= and sees that it's 1 and starts waiting.
=
=A02. node2 calls the = zk.create method (doesn't call the
zk.getChildren method yet, let's say it's very slow)
=A0= 3. node1 is notified that the number of children on the znode
changed, it checks that the size is 2 so it leaves the barrier, it
<= /blockquote>
does its work and then leaves the barrier, deleting its node.
=A04. node2 calls zk.getChildren and because node1 has already left,
it sees that the number of children is equal to 1. Since node1 will
never enter the barrier again, it will keep waiting.

Could thi= s scenario happen? If not, what is preventing this? I haven't
cop= ied the code piece that enters barrier-does work-leaves barrier.
But in the link I pasted above, it's the barrierTest(String args[])=
method.

Thank you very much in advance,

semih


flavio
junqueira
=A0
r= esearch scientist
=A0
fpj@yahoo-inc.com
direct +34 93= -183-8828
=A0
= avinguda diago= nal 177, 8th floor, barcelona, 08018, es
phone (408) 3= 49 3300=A0=A0=A0=A0fax (408) 349 3301





flavio
junqueira
=A0
re= search scientist
=A0
fpj@yahoo-inc.com
direct +34 93-= 183-8828
=A0
<= span style=3D"font-size:9pt;font-family:Calibri;color:gray">avinguda diagon= al 177, 8th floor, barcelona, 08018, es
phone (408) 34= 9 3300=A0=A0=A0=A0fax (408) 349 3301



--e0cb4e3856d483f91a049e09bb7e-- --e0cb4e3856d483f921049e09bb7f--