zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@apache.org>
Subject Re: Question about the Barrier Java example on the ZooKeeper documentation
Date Sun, 06 Mar 2011 02:41:14 GMT
  You pointed it out right. It is possible ot enter into a situation
like that. The recipe does have a bug. It can be fixed with the last
client creating a special znode and every node in the list watching
for that (so itll be an indication for entering the barrier). no?


On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <semih@stanford.edu> wrote:
> Hi All,
> I am new to this group and to ZooKeeper. I was readin the Barrier tutorial
> in one of the ZooKeeper documentations.
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html . A
> barrier primitive is exactly how I want to use ZooKeeper. I have a question
> about this example. It's not really a ZooKeeper question, it's more a
> question about the Barrier primitive I think. Here it is: In the enter
> method of this Barrier implementation below
> boolean enter() throws KeeperException, InterruptedException{
>            zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
>                    CreateMode.EPHEMERAL_SEQUENTIAL);
>            while (true) {
>                synchronized (mutex) {
>                    List<String> list = zk.getChildren(root, true);
>                    if (list.size() < size) {
>                        mutex.wait();
>                    } else {
>                        return true;
>                    }
>                }
>            }
>        }
> could there be a race condition? Let's say there are two
> machines/nodes: node1 and node2 that will use this code to synchronize
> over ZK. Let's say the following steps take place:
>   1. node1 calls the zk.create method and then reads the number of
> children, and sees that it's 1 and starts waiting.
>   2. node2 calls the zk.create method (doesn't call the
> zk.getChildren method yet, let's say it's very slow)
>   3. node1 is notified that the number of children on the znode
> changed, it checks that the size is 2 so it leaves the barrier, it
> does its work and then leaves the barrier, deleting its node.
>   4. node2 calls zk.getChildren and because node1 has already left,
> it sees that the number of children is equal to 1. Since node1 will
> never enter the barrier again, it will keep waiting.
> Could this scenario happen? If not, what is preventing this? I haven't
> copied the code piece that enters barrier-does work-leaves barrier.
> But in the link I pasted above, it's the barrierTest(String args[])
> method.
> Thank you very much in advance,
> semih

View raw message