hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Locking and Partial Failure
Date Mon, 31 May 2010 19:34:13 GMT
Hi Charles, any luck with this? Re the issues you found with the recipes 
please enter a JIRA, it would be good to address the problem(s) you found.
https://issues.apache.org/jira/browse/ZOOKEEPER

re use of session/thread id, might you use some sort of unique token 
that's dynamically assigned to the thread making a request on the shared 
session? The calling code could then be identified by that token in 
recovery cases.

Patrick

On 05/28/2010 08:28 AM, Charles Gordon wrote:
> Hello,
>
> I am new to using Zookeeper and I have a quick question about the locking
> recipe that can be found here:
>
> http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#sc_recipes_Locks
>
> It appears to me that there is a flaw in this algorithm related to partial
> failure, and I am curious to know how to fix it.
>
> The algorithm follows these steps:
>
>   1. Call "create()" with a pathname like "/some/path/to/parent/child-lock-".
>   2. Call "getChildren()" on the lock node without the watch flag set.
>   3. If the path created in step (1) has the lowest sequence number, you are
> the master (skip the next steps).
>   4. Otherwise, call "exists()" with the watch flag set on the child with the
> next lowest sequence number.
>   5. If "exists()" returns false, go to step (2), otherwise wait for a
> notification from the path, then go to step (2).
>
> The scenario that seems to be faulty is a partial failure in step (1).
> Assume that my client program follows step (1) and calls "create()". Assume
> that the call succeeds on the Zookeeper server, but there is a
> ConnectionLoss event right as the server sends the response (e.g., a network
> partition, some dropped packets, the ZK server goes down, etc). Assume
> further that the client immediately reconnects, so the session is not timed
> out. At this point there is a child node that was created by my client, but
> that my client does not know about (since it never received the response).
> Since my client doesn't know about the child, it won't know to watch the
> previous child to it, and it also won't know to delete it. That means all
> clients using that lock will fail to make progress as soon as the orphaned
> child is the lowest sequence number. This state will continue until my
> client closes it's session (which may be a while if I have a long lived
> session, as I would like to have). Correctness is maintained here, but
> live-ness is not.
>
> The only good solution I have found for this problem is to establish a new
> session with Zookeeper before acquiring a lock, and to close that session
> immediately upon any connection loss in step (1). If everything works, the
> session could be re-used, but you'd need to guarantee that the session was
> closed if there was a failure during creation of the child node. Are there
> other good solutions?
>
> I looked at the sample code that comes with the Zookeeper distribution (I'm
> using 3.2.2 right now), and it uses the current session ID as part of the
> child node name. Then, if there is a failure during creation, it tries to
> look up the child using that session ID. This isn't really helpful in the
> environment I'm using, where a single session could be shared by multiple
> threads, any of which could request a lock (so I can't uniquely identify a
> lock by session ID). I could use thread ID, but then I run the risk of a
> thread being reused and getting the wrong lock. In any case, there is also
> the risk that a second failure prevents me from looking up the lock after a
> connection loss, so I'm right back to an orphaned lock child, as above. I
> could, presumably, be careful enough with try/catch logic to prevent even
> that case, but it makes for pretty bug-prone code. Also, as a side note,
> that code appears to be sorting the child nodes by the session ID first,
> then the sequence number, which could cause locks to be ordered incorrectly.
>
> Thanks for any help you can provide!
>
> Charles Gordon
>

Mime
View raw message