From zookeeper-user-return-1655-apmail-hadoop-zookeeper-user-archive=hadoop.apache.org@hadoop.apache.org Tue May 04 21:34:31 2010 Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 38179 invoked from network); 4 May 2010 21:34:31 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 May 2010 21:34:31 -0000 Received: (qmail 15431 invoked by uid 500); 4 May 2010 21:34:30 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 15414 invoked by uid 500); 4 May 2010 21:34:30 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 15406 invoked by uid 99); 4 May 2010 21:34:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 May 2010 21:34:30 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 69.147.107.21 is neither permitted nor denied by domain of phunt@apache.org) Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.yahoo.com) (69.147.107.21) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 May 2010 21:34:23 +0000 Received: from [10.73.135.249] (wifi-e-135-249.corp.yahoo.com [10.73.135.249]) by mrout2-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id o44LWIQM038865; Tue, 4 May 2010 14:32:18 -0700 (PDT) Message-ID: <4BE09261.5040401@apache.org> Date: Tue, 04 May 2010 14:32:17 -0700 From: Patrick Hunt User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: zookeeper-user@hadoop.apache.org CC: Kapil Thangavelu Subject: Re: avoiding deadlocks on client handle close w/ python/c api References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Thanks Kapil, Mahadev perhaps you could take a look at this as well? Patrick On 05/04/2010 06:36 AM, Kapil Thangavelu wrote: > I've constructed a simple example just using the zkpython library with > condition variables, that will deadlock. I've filed a new ticket for it, > > https://issues.apache.org/jira/browse/ZOOKEEPER-763 > > the gdb stack traces look suspiciously like the ones in 591, but sans the > watchers. > https://issues.apache.org/jira/browse/ZOOKEEPER-591 > > the attached example on the ticket will deadlock in zk 3.3.0 (which has the > fix for 591) and trunk. > > -kapil > > On Mon, May 3, 2010 at 9:48 PM, Kapil Thangaveluwrote: > >> Hi Folks, >> >> I'm constructing an async api on top of the zookeeper python bindings for >> twisted. The intent was to make a thin wrapper that would wrap the existing >> async api with one that allows for integration with the twisted python event >> loop (http://www.twistedmatrix.com) primarily using the async apis. >> >> One issue i'm running into while developing a unit tests, deadlocks occur >> if we attempt to close a handle while there are any outstanding async >> requests (aget, acreate, etc). Normally on close both the io thread >> terminates and the completion thread are terminated and joined, however >> w\ith outstanding async requests, the completion thread won't be in a >> joinable state, and we effectively hang when the main thread does the join. >> >> I'm curious if this would be considered bug, afaics ideal behavior would be >> on close of a handle, to effectively clear out any remaining callbacks and >> let the completion thread terminate. >> >> i've tried adding some bookkeeping to the api to guard against closing >> while there is an outstanding completion request, but its an imperfect >> solution do to the nature of the event loop integration. The problem is that >> the python callback invoked by the completion thread in turn schedules a >> function for the main thread. In twisted the api for this is implemented by >> appending the function to a list attribute on the reactor and then writing a >> byte to a pipe to wakeup the main thread. If a thread switch to the main >> thread occurs before the completion thread callback returns, the scheduled >> function runs and the rest of the application keeps processing, of which the >> last step for the unit tests is to close the connection, which results in a >> deadlock. >> >> i've included some of the client log and gdb stack traces from a deadlock'd >> client process. >> >> thanks, >> >> Kapil >> >> >> >> >