hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Gordon <charles.gor...@gmail.com>
Subject Re: Locking and Partial Failure
Date Thu, 03 Jun 2010 01:35:22 GMT
It does look like a special case of that JIRA item. I read back through the
Chubby paper and it sounds like they solve this problem using a similar
mechanism. They just block the client until either they manage to
re-establish a session or until the session timeout expires (at which case
they return an error to the application). That seems like the right thing to
do here as well.

I can solve this problem for myself by just treating a CONNECTION LOSS event
(while holding a lock) as the end of a session and clearing my application
state. It isn't ideal, but it will do the job in a safe way while
guaranteeing progress on the locks.

CG.

On Mon, May 31, 2010 at 1:54 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

>
> Isn't this a special case of
> https://issues.apache.org/jira/browse/ZOOKEEPER-22 ?
>
> Is there any progress on this?
>
>
> On Mon, May 31, 2010 at 12:34 PM, Patrick Hunt <phunt@apache.org> wrote:
>
>> Hi Charles, any luck with this? Re the issues you found with the recipes
>> please enter a JIRA, it would be good to address the problem(s) you found.
>> https://issues.apache.org/jira/browse/ZOOKEEPER
>>
>> re use of session/thread id, might you use some sort of unique token
>> that's dynamically assigned to the thread making a request on the shared
>> session? The calling code could then be identified by that token in recovery
>> cases.
>>
>> Patrick
>>
>> On 05/28/2010 08:28 AM, Charles Gordon wrote:
>>
>>> Hello,
>>>
>>> I am new to using Zookeeper and I have a quick question about the locking
>>> recipe that can be found here:
>>>
>>>
>>> http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#sc_recipes_Locks
>>>
>>> It appears to me that there is a flaw in this algorithm related to
>>> partial
>>> failure, and I am curious to know how to fix it.
>>>
>>> The algorithm follows these steps:
>>>
>>>  1. Call "create()" with a pathname like
>>> "/some/path/to/parent/child-lock-".
>>>  2. Call "getChildren()" on the lock node without the watch flag set.
>>>  3. If the path created in step (1) has the lowest sequence number, you
>>> are
>>> the master (skip the next steps).
>>>  4. Otherwise, call "exists()" with the watch flag set on the child with
>>> the
>>> next lowest sequence number.
>>>  5. If "exists()" returns false, go to step (2), otherwise wait for a
>>> notification from the path, then go to step (2).
>>>
>>> The scenario that seems to be faulty is a partial failure in step (1).
>>> Assume that my client program follows step (1) and calls "create()".
>>> Assume
>>> that the call succeeds on the Zookeeper server, but there is a
>>> ConnectionLoss event right as the server sends the response (e.g., a
>>> network
>>> partition, some dropped packets, the ZK server goes down, etc). Assume
>>> further that the client immediately reconnects, so the session is not
>>> timed
>>> out. At this point there is a child node that was created by my client,
>>> but
>>> that my client does not know about (since it never received the
>>> response).
>>> Since my client doesn't know about the child, it won't know to watch the
>>> previous child to it, and it also won't know to delete it. That means all
>>> clients using that lock will fail to make progress as soon as the
>>> orphaned
>>> child is the lowest sequence number. This state will continue until my
>>> client closes it's session (which may be a while if I have a long lived
>>> session, as I would like to have). Correctness is maintained here, but
>>> live-ness is not.
>>>
>>> The only good solution I have found for this problem is to establish a
>>> new
>>> session with Zookeeper before acquiring a lock, and to close that session
>>> immediately upon any connection loss in step (1). If everything works,
>>> the
>>> session could be re-used, but you'd need to guarantee that the session
>>> was
>>> closed if there was a failure during creation of the child node. Are
>>> there
>>> other good solutions?
>>>
>>> I looked at the sample code that comes with the Zookeeper distribution
>>> (I'm
>>> using 3.2.2 right now), and it uses the current session ID as part of the
>>> child node name. Then, if there is a failure during creation, it tries to
>>> look up the child using that session ID. This isn't really helpful in the
>>> environment I'm using, where a single session could be shared by multiple
>>> threads, any of which could request a lock (so I can't uniquely identify
>>> a
>>> lock by session ID). I could use thread ID, but then I run the risk of a
>>> thread being reused and getting the wrong lock. In any case, there is
>>> also
>>> the risk that a second failure prevents me from looking up the lock after
>>> a
>>> connection loss, so I'm right back to an orphaned lock child, as above. I
>>> could, presumably, be careful enough with try/catch logic to prevent even
>>> that case, but it makes for pretty bug-prone code. Also, as a side note,
>>> that code appears to be sorting the child nodes by the session ID first,
>>> then the sequence number, which could cause locks to be ordered
>>> incorrectly.
>>>
>>> Thanks for any help you can provide!
>>>
>>> Charles Gordon
>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message