zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Semih Salihoglu <se...@stanford.edu>
Subject Re: Question about the Barrier Java example on the ZooKeeper documentation
Date Tue, 08 Mar 2011 21:13:06 GMT
Sure, I'll get to it this weekend probably.

I don't know what jira is so some information of how to do this would be
very helpful.

Thank you,

semih

On Tue, Mar 8, 2011 at 8:31 AM, Patrick Hunt <phunt@apache.org> wrote:

> On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira <fpj@yahoo-inc.com>wrote:
>
>> I believe the goal of the examples was never to be a complete solutions to
>> barriers or queues, but just to give a quick bootstrap to beginners. It is
>> true, though, that the documentation page does not make that claim, and can
>> be misleading.
>>
>> I see two possible action points out of this discussion:
>> 1- State clearly in the beginning that the example discussed is not
>> correct under the assumption that a process may finish the computation
>> before another has started, and the example is there for illustration
>> purposes;
>> 2- Have another example following the current one that discusses the
>> problem and shows how to fix it. This is an interesting option that
>> illustrates how one could reason about a solution when developing with
>> zookeeper.
>>
>>
> This (2) sounds much better to me. Semih, would you like to give that a
> try? (updating the docs I mean)
>
> Patrick
>
>
>> If you are interested in helping us fix it, Semih, then you could perhaps
>> create a jira and assign yourself to fix it. I can help you out.
>>
>> -Flavio
>>
>> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote:
>>
>> Hi Mahadev,
>>
>> Sorry for the late response. I agree, actually in this other documentation
>> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where there
>> is
>> only the pseudo-code, I think this situation is avoided. Here there is
>> another znode /ready that all nodes have a watch on. And after each node
>> writes their own ephemeral child, they don't wait. They read how many of
>> has
>> been written and the last one writes the /ready znode and everyone wakes
>> up.
>> The only race condition in this one is that there can be two nodes trying
>> to
>> write /ready and only one of them will succeed but this is ok.
>>
>> Thank you again,
>>
>> semih
>>
>> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <mahadev@apache.org> wrote:
>>
>> Semih,
>>
>> You pointed it out right. It is possible ot enter into a situation
>>
>> like that. The recipe does have a bug. It can be fixed with the last
>>
>> client creating a special znode and every node in the list watching
>>
>> for that (so itll be an indication for entering the barrier). no?
>>
>>
>> thanks
>>
>> mahadev
>>
>>
>> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <semih@stanford.edu>
>>
>> wrote:
>>
>> Hi All,
>>
>>
>> I am new to this group and to ZooKeeper. I was readin the Barrier
>>
>> tutorial
>>
>> in one of the ZooKeeper documentations.
>>
>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html .
>>
>> A
>>
>> barrier primitive is exactly how I want to use ZooKeeper. I have a
>>
>> question
>>
>> about this example. It's not really a ZooKeeper question, it's more a
>>
>> question about the Barrier primitive I think. Here it is: In the enter
>>
>> method of this Barrier implementation below
>>
>>
>> boolean enter() throws KeeperException, InterruptedException{
>>
>>            zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
>>
>>                   CreateMode.EPHEMERAL_SEQUENTIAL);
>>
>>           while (true) {
>>
>>               synchronized (mutex) {
>>
>>                    List<String> list = zk.getChildren(root, true);
>>
>>
>>                    if (list.size() < size) {
>>
>>                       mutex.wait();
>>
>>                   } else {
>>
>>                       return true;
>>
>>                    }
>>
>>               }
>>
>>            }
>>
>>       }
>>
>>
>> could there be a race condition? Let's say there are two
>>
>> machines/nodes: node1 and node2 that will use this code to synchronize
>>
>> over ZK. Let's say the following steps take place:
>>
>>
>>
>>  1. node1 calls the zk.create method and then reads the number of
>>
>> children, and sees that it's 1 and starts waiting.
>>
>>  2. node2 calls the zk.create method (doesn't call the
>>
>> zk.getChildren method yet, let's say it's very slow)
>>
>>  3. node1 is notified that the number of children on the znode
>>
>> changed, it checks that the size is 2 so it leaves the barrier, it
>>
>> does its work and then leaves the barrier, deleting its node.
>>
>>  4. node2 calls zk.getChildren and because node1 has already left,
>>
>> it sees that the number of children is equal to 1. Since node1 will
>>
>> never enter the barrier again, it will keep waiting.
>>
>>
>> Could this scenario happen? If not, what is preventing this? I haven't
>>
>> copied the code piece that enters barrier-does work-leaves barrier.
>>
>> But in the link I pasted above, it's the barrierTest(String args[])
>>
>> method.
>>
>>
>> Thank you very much in advance,
>>
>>
>> semih
>>
>>
>>
>>
>>   *flavio*
>> *junqueira*
>>
>> research scientist
>>
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>>
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message