zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Question about the Barrier Java example on the ZooKeeper documentation
Date Tue, 08 Mar 2011 16:31:13 GMT
On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira <fpj@yahoo-inc.com> wrote:

> I believe the goal of the examples was never to be a complete solutions to
> barriers or queues, but just to give a quick bootstrap to beginners. It is
> true, though, that the documentation page does not make that claim, and can
> be misleading.
>
> I see two possible action points out of this discussion:
> 1- State clearly in the beginning that the example discussed is not correct
> under the assumption that a process may finish the computation before
> another has started, and the example is there for illustration purposes;
> 2- Have another example following the current one that discusses the
> problem and shows how to fix it. This is an interesting option that
> illustrates how one could reason about a solution when developing with
> zookeeper.
>
>
This (2) sounds much better to me. Semih, would you like to give that a try?
(updating the docs I mean)

Patrick


> If you are interested in helping us fix it, Semih, then you could perhaps
> create a jira and assign yourself to fix it. I can help you out.
>
> -Flavio
>
> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote:
>
> Hi Mahadev,
>
> Sorry for the late response. I agree, actually in this other documentation
> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where there
> is
> only the pseudo-code, I think this situation is avoided. Here there is
> another znode /ready that all nodes have a watch on. And after each node
> writes their own ephemeral child, they don't wait. They read how many of
> has
> been written and the last one writes the /ready znode and everyone wakes
> up.
> The only race condition in this one is that there can be two nodes trying
> to
> write /ready and only one of them will succeed but this is ok.
>
> Thank you again,
>
> semih
>
> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <mahadev@apache.org> wrote:
>
> Semih,
>
> You pointed it out right. It is possible ot enter into a situation
>
> like that. The recipe does have a bug. It can be fixed with the last
>
> client creating a special znode and every node in the list watching
>
> for that (so itll be an indication for entering the barrier). no?
>
>
> thanks
>
> mahadev
>
>
> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <semih@stanford.edu>
>
> wrote:
>
> Hi All,
>
>
> I am new to this group and to ZooKeeper. I was readin the Barrier
>
> tutorial
>
> in one of the ZooKeeper documentations.
>
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html .
>
> A
>
> barrier primitive is exactly how I want to use ZooKeeper. I have a
>
> question
>
> about this example. It's not really a ZooKeeper question, it's more a
>
> question about the Barrier primitive I think. Here it is: In the enter
>
> method of this Barrier implementation below
>
>
> boolean enter() throws KeeperException, InterruptedException{
>
>           zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
>
>                   CreateMode.EPHEMERAL_SEQUENTIAL);
>
>           while (true) {
>
>               synchronized (mutex) {
>
>                   List<String> list = zk.getChildren(root, true);
>
>
>                   if (list.size() < size) {
>
>                       mutex.wait();
>
>                   } else {
>
>                       return true;
>
>                   }
>
>               }
>
>           }
>
>       }
>
>
> could there be a race condition? Let's say there are two
>
> machines/nodes: node1 and node2 that will use this code to synchronize
>
> over ZK. Let's say the following steps take place:
>
>
>
>  1. node1 calls the zk.create method and then reads the number of
>
> children, and sees that it's 1 and starts waiting.
>
>  2. node2 calls the zk.create method (doesn't call the
>
> zk.getChildren method yet, let's say it's very slow)
>
>  3. node1 is notified that the number of children on the znode
>
> changed, it checks that the size is 2 so it leaves the barrier, it
>
> does its work and then leaves the barrier, deleting its node.
>
>  4. node2 calls zk.getChildren and because node1 has already left,
>
> it sees that the number of children is equal to 1. Since node1 will
>
> never enter the barrier again, it will keep waiting.
>
>
> Could this scenario happen? If not, what is preventing this? I haven't
>
> copied the code piece that enters barrier-does work-leaves barrier.
>
> But in the link I pasted above, it's the barrierTest(String args[])
>
> method.
>
>
> Thank you very much in advance,
>
>
> semih
>
>
>
>
> *flavio*
> *junqueira*
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message