zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@yahoo-inc.com>
Subject Re: Question about the Barrier Java example on the ZooKeeper documentation
Date Wed, 09 Mar 2011 09:30:36 GMT
Hi Semih, Jira is the system we use to report and discuss zookeeper  
issues:

	https://issues.apache.org/jira/browse/ZOOKEEPER

Once you have an account, you can create a new issue, describe it, and  
propose a fix to the problem at hand.

-Flavio

On Mar 8, 2011, at 10:13 PM, Semih Salihoglu wrote:

> Sure, I'll get to it this weekend probably.
>
> I don't know what jira is so some information of how to do this  
> would be very helpful.
>
> Thank you,
>
> semih
>
> On Tue, Mar 8, 2011 at 8:31 AM, Patrick Hunt <phunt@apache.org> wrote:
> On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira <fpj@yahoo-inc.com>  
> wrote:
> I believe the goal of the examples was never to be a complete  
> solutions to barriers or queues, but just to give a quick bootstrap  
> to beginners. It is true, though, that the documentation page does  
> not make that claim, and can be misleading.
>
> I see two possible action points out of this discussion:
> 	
> 1- State clearly in the beginning that the example discussed is not  
> correct under the assumption that a process may finish the  
> computation before another has started, and the example is there for  
> illustration purposes;
> 2- Have another example following the current one that discusses the  
> problem and shows how to fix it. This is an interesting option that  
> illustrates how one could reason about a solution when developing  
> with zookeeper.
>
>
> This (2) sounds much better to me. Semih, would you like to give  
> that a try? (updating the docs I mean)
>
> Patrick
>
> If you are interested in helping us fix it, Semih, then you could  
> perhaps create a jira and assign yourself to fix it. I can help you  
> out.
>
> -Flavio
>
> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote:
>
>> Hi Mahadev,
>>
>> Sorry for the late response. I agree, actually in this other  
>> documentation
>> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where  
>> there is
>> only the pseudo-code, I think this situation is avoided. Here there  
>> is
>> another znode /ready that all nodes have a watch on. And after each  
>> node
>> writes their own ephemeral child, they don't wait. They read how  
>> many of has
>> been written and the last one writes the /ready znode and everyone  
>> wakes up.
>> The only race condition in this one is that there can be two nodes  
>> trying to
>> write /ready and only one of them will succeed but this is ok.
>>
>> Thank you again,
>>
>> semih
>>
>> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <mahadev@apache.org>  
>> wrote:
>>
>>> Semih,
>>> You pointed it out right. It is possible ot enter into a situation
>>> like that. The recipe does have a bug. It can be fixed with the last
>>> client creating a special znode and every node in the list watching
>>> for that (so itll be an indication for entering the barrier). no?
>>>
>>> thanks
>>> mahadev
>>>
>>> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <semih@stanford.edu>
>>> wrote:
>>>> Hi All,
>>>>
>>>> I am new to this group and to ZooKeeper. I was readin the Barrier
>>> tutorial
>>>> in one of the ZooKeeper documentations.
>>>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html 
>>>>  .
>>> A
>>>> barrier primitive is exactly how I want to use ZooKeeper. I have a
>>> question
>>>> about this example. It's not really a ZooKeeper question, it's  
>>>> more a
>>>> question about the Barrier primitive I think. Here it is: In the  
>>>> enter
>>>> method of this Barrier implementation below
>>>>
>>>> boolean enter() throws KeeperException, InterruptedException{
>>>>           zk.create(root + "/" + name, new byte[0],  
>>>> Ids.OPEN_ACL_UNSAFE,
>>>>                   CreateMode.EPHEMERAL_SEQUENTIAL);
>>>>           while (true) {
>>>>               synchronized (mutex) {
>>>>                   List<String> list = zk.getChildren(root, true);
>>>>
>>>>                   if (list.size() < size) {
>>>>                       mutex.wait();
>>>>                   } else {
>>>>                       return true;
>>>>                   }
>>>>               }
>>>>           }
>>>>       }
>>>>
>>>> could there be a race condition? Let's say there are two
>>>> machines/nodes: node1 and node2 that will use this code to  
>>>> synchronize
>>>> over ZK. Let's say the following steps take place:
>>>>
>>>>
>>>>  1. node1 calls the zk.create method and then reads the number of
>>>> children, and sees that it's 1 and starts waiting.
>>>>  2. node2 calls the zk.create method (doesn't call the
>>>> zk.getChildren method yet, let's say it's very slow)
>>>>  3. node1 is notified that the number of children on the znode
>>>> changed, it checks that the size is 2 so it leaves the barrier, it
>>>> does its work and then leaves the barrier, deleting its node.
>>>>  4. node2 calls zk.getChildren and because node1 has already left,
>>>> it sees that the number of children is equal to 1. Since node1 will
>>>> never enter the barrier again, it will keep waiting.
>>>>
>>>> Could this scenario happen? If not, what is preventing this? I  
>>>> haven't
>>>> copied the code piece that enters barrier-does work-leaves barrier.
>>>> But in the link I pasted above, it's the barrierTest(String args[])
>>>> method.
>>>>
>>>> Thank you very much in advance,
>>>>
>>>> semih
>>>>
>>>
>
> flavio
> junqueira
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>
>
>

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301




Mime
View raw message