zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Semih Salihoglu <se...@stanford.edu>
Subject Re: Question about the Barrier Java example on the ZooKeeper documentation
Date Mon, 07 Mar 2011 10:23:49 GMT
Hi Mahadev,

Sorry for the late response. I agree, actually in this other documentation
http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where there is
only the pseudo-code, I think this situation is avoided. Here there is
another znode /ready that all nodes have a watch on. And after each node
writes their own ephemeral child, they don't wait. They read how many of has
been written and the last one writes the /ready znode and everyone wakes up.
The only race condition in this one is that there can be two nodes trying to
write /ready and only one of them will succeed but this is ok.

Thank you again,

semih

On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <mahadev@apache.org> wrote:

> Semih,
>  You pointed it out right. It is possible ot enter into a situation
> like that. The recipe does have a bug. It can be fixed with the last
> client creating a special znode and every node in the list watching
> for that (so itll be an indication for entering the barrier). no?
>
> thanks
> mahadev
>
> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <semih@stanford.edu>
> wrote:
> > Hi All,
> >
> > I am new to this group and to ZooKeeper. I was readin the Barrier
> tutorial
> > in one of the ZooKeeper documentations.
> > http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html .
> A
> > barrier primitive is exactly how I want to use ZooKeeper. I have a
> question
> > about this example. It's not really a ZooKeeper question, it's more a
> > question about the Barrier primitive I think. Here it is: In the enter
> > method of this Barrier implementation below
> >
> > boolean enter() throws KeeperException, InterruptedException{
> >            zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
> >                    CreateMode.EPHEMERAL_SEQUENTIAL);
> >            while (true) {
> >                synchronized (mutex) {
> >                    List<String> list = zk.getChildren(root, true);
> >
> >                    if (list.size() < size) {
> >                        mutex.wait();
> >                    } else {
> >                        return true;
> >                    }
> >                }
> >            }
> >        }
> >
> > could there be a race condition? Let's say there are two
> > machines/nodes: node1 and node2 that will use this code to synchronize
> > over ZK. Let's say the following steps take place:
> >
> >
> >   1. node1 calls the zk.create method and then reads the number of
> > children, and sees that it's 1 and starts waiting.
> >   2. node2 calls the zk.create method (doesn't call the
> > zk.getChildren method yet, let's say it's very slow)
> >   3. node1 is notified that the number of children on the znode
> > changed, it checks that the size is 2 so it leaves the barrier, it
> > does its work and then leaves the barrier, deleting its node.
> >   4. node2 calls zk.getChildren and because node1 has already left,
> > it sees that the number of children is equal to 1. Since node1 will
> > never enter the barrier again, it will keep waiting.
> >
> > Could this scenario happen? If not, what is preventing this? I haven't
> > copied the code piece that enters barrier-does work-leaves barrier.
> > But in the link I pasted above, it's the barrierTest(String args[])
> > method.
> >
> > Thank you very much in advance,
> >
> > semih
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message