hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: avoiding deadlocks on client handle close w/ python/c api
Date Tue, 04 May 2010 21:32:17 GMT
Thanks Kapil, Mahadev perhaps you could take a look at this as well?

Patrick

On 05/04/2010 06:36 AM, Kapil Thangavelu wrote:
> I've constructed  a simple example just using the zkpython library with
> condition variables, that will deadlock. I've filed a new ticket for it,
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-763
>
> the gdb stack traces look suspiciously like the ones in 591, but sans the
> watchers.
> https://issues.apache.org/jira/browse/ZOOKEEPER-591
>
> the attached example on the ticket will deadlock in zk 3.3.0 (which has the
> fix for 591) and trunk.
>
> -kapil
>
> On Mon, May 3, 2010 at 9:48 PM, Kapil Thangavelu<kapil.foss@gmail.com>wrote:
>
>> Hi Folks,
>>
>> I'm constructing an async api on top of the zookeeper python bindings for
>> twisted. The intent was to make a thin wrapper that would wrap the existing
>> async api with one that allows for integration with the twisted python event
>> loop (http://www.twistedmatrix.com) primarily using the async apis.
>>
>> One issue i'm running into while developing a unit tests, deadlocks occur
>> if we attempt to close a handle while there are any outstanding async
>> requests (aget, acreate, etc). Normally on close both the io thread
>> terminates and the completion thread are terminated and joined, however
>> w\ith outstanding async requests, the completion thread won't be in a
>> joinable state, and we effectively hang when the main thread does the join.
>>
>> I'm curious if this would be considered bug, afaics ideal behavior would be
>> on close of a handle, to effectively clear out any remaining callbacks and
>> let the completion thread terminate.
>>
>> i've tried adding some bookkeeping to the api to guard against closing
>> while there is an outstanding completion request, but its an imperfect
>> solution do to the nature of the event loop integration. The problem is that
>> the python callback invoked by the completion thread in turn schedules a
>> function for the main thread. In twisted the api for this is implemented by
>> appending the function to a list attribute on the reactor and then writing a
>> byte to a pipe to wakeup the main thread. If a thread switch to the main
>> thread occurs before the completion thread callback returns, the scheduled
>> function runs and the rest of the application keeps processing, of which the
>> last step for the unit tests is to close the connection, which results in a
>> deadlock.
>>
>> i've included some of the client log and gdb stack traces from a deadlock'd
>> client process.
>>
>> thanks,
>>
>> Kapil
>>
>>
>>
>>
>

Mime
View raw message