zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Locking Reciepe
Date Thu, 21 Jun 2012 23:20:39 GMT
Mahadev any insight on this?

On Fri, Jun 15, 2012 at 9:25 AM, Kevin Harms <harms@alcf.anl.gov> wrote:
>
>  I setup a single zookeeper instance using the binaries distributed with Ubuntu 12.04.
I downloaded the 3.3.5 source and compiled the C based locking recipe. I built this into a
program of mine and ran into a problem. So I had some questions.
>
>  If i wanted to create 1000 locks, do i setup the locks as follows?
>  /lock/0
>  /lock/1
>  ...
>  /lock/999
>
>  is this correct?
>
>  I was running an example with two clients competing for 1 lock running on the same
machine the zookeeper instance was running on. I found that zkr_lock_lock() would often fail
to acquire the lock, so i put that in a loop with 1000 retries. That seems to make it work
most of the time, but other times there would still be a failure at zoo_lock.c:301
>
>                // cannot watch my predecessor i am giving up
>                // we need to be able to watch the predecessor
>                // since if we do not become a leader the others
>                // will keep waiting
> [301]           if (ret != ZOK) {
>                    free_String_vector(vector);
>
>
>  I put a printf to see what ret was and it was ZNONODE. Now looking at the code above
this spot, get_children is called and then it sorts the results and later calls zoo_wexists.
It seems reasonable that the state could change between these two calls? I added a statement
that if the result was ZNONODE, it does a goto back to above where get_children is called
so it runs the algorithm again.
>
>  That changes seems to make the code work all the time now, but I'm not sure if that
change is correct. I've included the diff below. So is it expected that zkr_lock_lock will
fail periodically since it only tries to acquire the lock 4 times?
>
> thanks for any help,
> kevin
>
> --- zoo_lock.c.orig     2012-06-15 00:37:53.880508812 -0500
> +++ zoo_lock.c  2012-06-15 00:41:41.304518262 -0500
> @@ -273,6 +273,7 @@ static int zkr_lock_operation(zkr_lock_m
>             mutex->id = getName(retbuf);
>         }
>
> +tryagain:
>         if (mutex->id != NULL) {
>             ret = ZCONNECTIONLOSS;
>             ret = retry_getchildren(zh, path, vector, ts, retry);
> @@ -299,7 +300,9 @@ static int zkr_lock_operation(zkr_lock_m
>                 // will keep waiting
>                 if (ret != ZOK) {
>                     free_String_vector(vector);
> +                    if (ret == ZNONODE) goto tryagain;
>                     LOG_WARN(("unable to watch my predecessor"));
> +                    printf("zret = %d\n", ret);
>                     ret = zkr_lock_unlock(mutex);
>                     while (ret == 0) {
>                         //we have to give up our leadership
>

Mime
View raw message