Hi Semih, Jira is the system we use to report and discuss zookeeper issues: https://issues.apache.org/jira/browse/ZOOKEEPER Once you have an account, you can create a new issue, describe it, and propose a fix to the problem at hand. -Flavio On Mar 8, 2011, at 10:13 PM, Semih Salihoglu wrote: > Sure, I'll get to it this weekend probably. > > I don't know what jira is so some information of how to do this > would be very helpful. > > Thank you, > > semih > > On Tue, Mar 8, 2011 at 8:31 AM, Patrick Hunt wrote: > On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira > wrote: > I believe the goal of the examples was never to be a complete > solutions to barriers or queues, but just to give a quick bootstrap > to beginners. It is true, though, that the documentation page does > not make that claim, and can be misleading. > > I see two possible action points out of this discussion: > > 1- State clearly in the beginning that the example discussed is not > correct under the assumption that a process may finish the > computation before another has started, and the example is there for > illustration purposes; > 2- Have another example following the current one that discusses the > problem and shows how to fix it. This is an interesting option that > illustrates how one could reason about a solution when developing > with zookeeper. > > > This (2) sounds much better to me. Semih, would you like to give > that a try? (updating the docs I mean) > > Patrick > > If you are interested in helping us fix it, Semih, then you could > perhaps create a jira and assign yourself to fix it. I can help you > out. > > -Flavio > > On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote: > >> Hi Mahadev, >> >> Sorry for the late response. I agree, actually in this other >> documentation >> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where >> there is >> only the pseudo-code, I think this situation is avoided. Here there >> is >> another znode /ready that all nodes have a watch on. And after each >> node >> writes their own ephemeral child, they don't wait. They read how >> many of has >> been written and the last one writes the /ready znode and everyone >> wakes up. >> The only race condition in this one is that there can be two nodes >> trying to >> write /ready and only one of them will succeed but this is ok. >> >> Thank you again, >> >> semih >> >> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar >> wrote: >> >>> Semih, >>> You pointed it out right. It is possible ot enter into a situation >>> like that. The recipe does have a bug. It can be fixed with the last >>> client creating a special znode and every node in the list watching >>> for that (so itll be an indication for entering the barrier). no? >>> >>> thanks >>> mahadev >>> >>> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu >>> wrote: >>>> Hi All, >>>> >>>> I am new to this group and to ZooKeeper. I was readin the Barrier >>> tutorial >>>> in one of the ZooKeeper documentations. >>>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html >>>> . >>> A >>>> barrier primitive is exactly how I want to use ZooKeeper. I have a >>> question >>>> about this example. It's not really a ZooKeeper question, it's >>>> more a >>>> question about the Barrier primitive I think. Here it is: In the >>>> enter >>>> method of this Barrier implementation below >>>> >>>> boolean enter() throws KeeperException, InterruptedException{ >>>> zk.create(root + "/" + name, new byte[0], >>>> Ids.OPEN_ACL_UNSAFE, >>>> CreateMode.EPHEMERAL_SEQUENTIAL); >>>> while (true) { >>>> synchronized (mutex) { >>>> List list = zk.getChildren(root, true); >>>> >>>> if (list.size() < size) { >>>> mutex.wait(); >>>> } else { >>>> return true; >>>> } >>>> } >>>> } >>>> } >>>> >>>> could there be a race condition? Let's say there are two >>>> machines/nodes: node1 and node2 that will use this code to >>>> synchronize >>>> over ZK. Let's say the following steps take place: >>>> >>>> >>>> 1. node1 calls the zk.create method and then reads the number of >>>> children, and sees that it's 1 and starts waiting. >>>> 2. node2 calls the zk.create method (doesn't call the >>>> zk.getChildren method yet, let's say it's very slow) >>>> 3. node1 is notified that the number of children on the znode >>>> changed, it checks that the size is 2 so it leaves the barrier, it >>>> does its work and then leaves the barrier, deleting its node. >>>> 4. node2 calls zk.getChildren and because node1 has already left, >>>> it sees that the number of children is equal to 1. Since node1 will >>>> never enter the barrier again, it will keep waiting. >>>> >>>> Could this scenario happen? If not, what is preventing this? I >>>> haven't >>>> copied the code piece that enters barrier-does work-leaves barrier. >>>> But in the link I pasted above, it's the barrierTest(String args[]) >>>> method. >>>> >>>> Thank you very much in advance, >>>> >>>> semih >>>> >>> > > flavio > junqueira > > research scientist > > fpj@yahoo-inc.com > direct +34 93-183-8828 > > avinguda diagonal 177, 8th floor, barcelona, 08018, es > phone (408) 349 3300 fax (408) 349 3301 > > > > > flavio junqueira research scientist fpj@yahoo-inc.com direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300 fax (408) 349 3301