zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@yahoo-inc.com>
Subject Re: avoiding deadlocks on client handle close w/ python/c api
Date Tue, 04 May 2010 21:41:00 GMT
Sure, Ill take a look at it.

Thanks
mahadev


On 5/4/10 2:32 PM, "Patrick Hunt" <phunt@apache.org> wrote:

> Thanks Kapil, Mahadev perhaps you could take a look at this as well?
> 
> Patrick
> 
> On 05/04/2010 06:36 AM, Kapil Thangavelu wrote:
>> I've constructed  a simple example just using the zkpython library with
>> condition variables, that will deadlock. I've filed a new ticket for it,
>> 
>> https://issues.apache.org/jira/browse/ZOOKEEPER-763
>> 
>> the gdb stack traces look suspiciously like the ones in 591, but sans the
>> watchers.
>> https://issues.apache.org/jira/browse/ZOOKEEPER-591
>> 
>> the attached example on the ticket will deadlock in zk 3.3.0 (which has the
>> fix for 591) and trunk.
>> 
>> -kapil
>> 
>> On Mon, May 3, 2010 at 9:48 PM, Kapil Thangavelu<kapil.foss@gmail.com>wrote:
>> 
>>> Hi Folks,
>>> 
>>> I'm constructing an async api on top of the zookeeper python bindings for
>>> twisted. The intent was to make a thin wrapper that would wrap the existing
>>> async api with one that allows for integration with the twisted python event
>>> loop (http://www.twistedmatrix.com) primarily using the async apis.
>>> 
>>> One issue i'm running into while developing a unit tests, deadlocks occur
>>> if we attempt to close a handle while there are any outstanding async
>>> requests (aget, acreate, etc). Normally on close both the io thread
>>> terminates and the completion thread are terminated and joined, however
>>> w\ith outstanding async requests, the completion thread won't be in a
>>> joinable state, and we effectively hang when the main thread does the join.
>>> 
>>> I'm curious if this would be considered bug, afaics ideal behavior would be
>>> on close of a handle, to effectively clear out any remaining callbacks and
>>> let the completion thread terminate.
>>> 
>>> i've tried adding some bookkeeping to the api to guard against closing
>>> while there is an outstanding completion request, but its an imperfect
>>> solution do to the nature of the event loop integration. The problem is that
>>> the python callback invoked by the completion thread in turn schedules a
>>> function for the main thread. In twisted the api for this is implemented by
>>> appending the function to a list attribute on the reactor and then writing a
>>> byte to a pipe to wakeup the main thread. If a thread switch to the main
>>> thread occurs before the completion thread callback returns, the scheduled
>>> function runs and the rest of the application keeps processing, of which the
>>> last step for the unit tests is to close the connection, which results in a
>>> deadlock.
>>> 
>>> i've included some of the client log and gdb stack traces from a deadlock'd
>>> client process.
>>> 
>>> thanks,
>>> 
>>> Kapil
>>> 
>>> 
>>> 
>>> 
>> 


Mime
View raw message