hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kapil Thangavelu <kapil.f...@gmail.com>
Subject Re: avoiding deadlocks on client handle close w/ python/c api
Date Tue, 04 May 2010 13:36:03 GMT
I've constructed  a simple example just using the zkpython library with
condition variables, that will deadlock. I've filed a new ticket for it,

https://issues.apache.org/jira/browse/ZOOKEEPER-763

the gdb stack traces look suspiciously like the ones in 591, but sans the
watchers.
https://issues.apache.org/jira/browse/ZOOKEEPER-591

the attached example on the ticket will deadlock in zk 3.3.0 (which has the
fix for 591) and trunk.

-kapil

On Mon, May 3, 2010 at 9:48 PM, Kapil Thangavelu <kapil.foss@gmail.com>wrote:

> Hi Folks,
>
> I'm constructing an async api on top of the zookeeper python bindings for
> twisted. The intent was to make a thin wrapper that would wrap the existing
> async api with one that allows for integration with the twisted python event
> loop (http://www.twistedmatrix.com) primarily using the async apis.
>
> One issue i'm running into while developing a unit tests, deadlocks occur
> if we attempt to close a handle while there are any outstanding async
> requests (aget, acreate, etc). Normally on close both the io thread
> terminates and the completion thread are terminated and joined, however
> w\ith outstanding async requests, the completion thread won't be in a
> joinable state, and we effectively hang when the main thread does the join.
>
> I'm curious if this would be considered bug, afaics ideal behavior would be
> on close of a handle, to effectively clear out any remaining callbacks and
> let the completion thread terminate.
>
> i've tried adding some bookkeeping to the api to guard against closing
> while there is an outstanding completion request, but its an imperfect
> solution do to the nature of the event loop integration. The problem is that
> the python callback invoked by the completion thread in turn schedules a
> function for the main thread. In twisted the api for this is implemented by
> appending the function to a list attribute on the reactor and then writing a
> byte to a pipe to wakeup the main thread. If a thread switch to the main
> thread occurs before the completion thread callback returns, the scheduled
> function runs and the rest of the application keeps processing, of which the
> last step for the unit tests is to close the connection, which results in a
> deadlock.
>
> i've included some of the client log and gdb stack traces from a deadlock'd
> client process.
>
> thanks,
>
> Kapil
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message