Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 256 invoked from network); 31 May 2010 20:55:28 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 31 May 2010 20:55:28 -0000 Received: (qmail 54710 invoked by uid 500); 31 May 2010 20:55:28 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 54668 invoked by uid 500); 31 May 2010 20:55:28 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 54660 invoked by uid 99); 31 May 2010 20:55:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 May 2010 20:55:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pw0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 May 2010 20:55:20 +0000 Received: by pwj10 with SMTP id 10so2750996pwj.35 for ; Mon, 31 May 2010 13:54:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type; bh=4GR+n+KvNQoPB/BXzp3oH5P+j3XuKEn12UlD4t6Z/9o=; b=lieE3GDMW90bcNwFw995Ldnk78iZhTJoRPFkcsv4EGWvZjxlS56kmwPJ7e2A9QkCp2 Mhj+OrDKxTXUVdeJa2NQKqCSbr8JB7KYihr5I1PjJC/N5FsDOJkMKpAmWWjcyjHf/vTb dmTSkkgRow+2J1UArlqOnQnWjjWnWgZ0KHInM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; b=IanA7YsJn+oC89JDUt2r6aWHVunnfXNsIWVyv6+RAB2mPniyyvzaIPa7M6ypMhudc/ 5PP+JwTL/JzjVrxZvWyzK5/OtXetU+SqHHTuQ+8j68EJDn04NyX5JFlWjDpiniZWShDl jB/vyA3VsFAqLfRVAx6BSKq06WAkAuWQUIXJg= Received: by 10.140.57.20 with SMTP id f20mr3711984rva.175.1275339297426; Mon, 31 May 2010 13:54:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.141.43.7 with HTTP; Mon, 31 May 2010 13:54:37 -0700 (PDT) In-Reply-To: <4C040F35.7000809@apache.org> References: <4C040F35.7000809@apache.org> From: Ted Dunning Date: Mon, 31 May 2010 13:54:37 -0700 Message-ID: Subject: Re: Locking and Partial Failure To: zookeeper-user@hadoop.apache.org Cc: Charles Gordon Content-Type: multipart/alternative; boundary=005045017dedca8e5a0487ea1149 X-Virus-Checked: Checked by ClamAV on apache.org --005045017dedca8e5a0487ea1149 Content-Type: text/plain; charset=UTF-8 Isn't this a special case of https://issues.apache.org/jira/browse/ZOOKEEPER-22 ? Is there any progress on this? On Mon, May 31, 2010 at 12:34 PM, Patrick Hunt wrote: > Hi Charles, any luck with this? Re the issues you found with the recipes > please enter a JIRA, it would be good to address the problem(s) you found. > https://issues.apache.org/jira/browse/ZOOKEEPER > > re use of session/thread id, might you use some sort of unique token that's > dynamically assigned to the thread making a request on the shared session? > The calling code could then be identified by that token in recovery cases. > > Patrick > > On 05/28/2010 08:28 AM, Charles Gordon wrote: > >> Hello, >> >> I am new to using Zookeeper and I have a quick question about the locking >> recipe that can be found here: >> >> >> http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#sc_recipes_Locks >> >> It appears to me that there is a flaw in this algorithm related to partial >> failure, and I am curious to know how to fix it. >> >> The algorithm follows these steps: >> >> 1. Call "create()" with a pathname like >> "/some/path/to/parent/child-lock-". >> 2. Call "getChildren()" on the lock node without the watch flag set. >> 3. If the path created in step (1) has the lowest sequence number, you >> are >> the master (skip the next steps). >> 4. Otherwise, call "exists()" with the watch flag set on the child with >> the >> next lowest sequence number. >> 5. If "exists()" returns false, go to step (2), otherwise wait for a >> notification from the path, then go to step (2). >> >> The scenario that seems to be faulty is a partial failure in step (1). >> Assume that my client program follows step (1) and calls "create()". >> Assume >> that the call succeeds on the Zookeeper server, but there is a >> ConnectionLoss event right as the server sends the response (e.g., a >> network >> partition, some dropped packets, the ZK server goes down, etc). Assume >> further that the client immediately reconnects, so the session is not >> timed >> out. At this point there is a child node that was created by my client, >> but >> that my client does not know about (since it never received the response). >> Since my client doesn't know about the child, it won't know to watch the >> previous child to it, and it also won't know to delete it. That means all >> clients using that lock will fail to make progress as soon as the orphaned >> child is the lowest sequence number. This state will continue until my >> client closes it's session (which may be a while if I have a long lived >> session, as I would like to have). Correctness is maintained here, but >> live-ness is not. >> >> The only good solution I have found for this problem is to establish a new >> session with Zookeeper before acquiring a lock, and to close that session >> immediately upon any connection loss in step (1). If everything works, the >> session could be re-used, but you'd need to guarantee that the session was >> closed if there was a failure during creation of the child node. Are there >> other good solutions? >> >> I looked at the sample code that comes with the Zookeeper distribution >> (I'm >> using 3.2.2 right now), and it uses the current session ID as part of the >> child node name. Then, if there is a failure during creation, it tries to >> look up the child using that session ID. This isn't really helpful in the >> environment I'm using, where a single session could be shared by multiple >> threads, any of which could request a lock (so I can't uniquely identify a >> lock by session ID). I could use thread ID, but then I run the risk of a >> thread being reused and getting the wrong lock. In any case, there is also >> the risk that a second failure prevents me from looking up the lock after >> a >> connection loss, so I'm right back to an orphaned lock child, as above. I >> could, presumably, be careful enough with try/catch logic to prevent even >> that case, but it makes for pretty bug-prone code. Also, as a side note, >> that code appears to be sorting the child nodes by the session ID first, >> then the sequence number, which could cause locks to be ordered >> incorrectly. >> >> Thanks for any help you can provide! >> >> Charles Gordon >> >> --005045017dedca8e5a0487ea1149--