hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-763) Deadlock on close w/ zkpython / c client
Date Wed, 05 May 2010 20:21:05 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864488#action_12864488
] 

Henry Robinson commented on ZOOKEEPER-763:
------------------------------------------

Kapil - 

Thanks! Adding that sleep helped me understand what was going on. 

pyzoo_close has the GIL but blocks inside zookeeper_close, waiting for the completion thread
to finish. However, if a completion is still inside Python, but has been pre-empted by the
main thread which calls pyzoo_close, the completion can't get the GIL back to finish up executing,
blocking the completions_thread for ever more. The fix is simple - relinquish the GIL during
the zookeeper_close call, and then reacquire it straight after. There are even handy macros
to do this:

Py_BEGIN_ALLOW_THREADS
ret = zookeeper_close(zhandles[zkhid]);
Py_END_ALLOW_THREADS

This same issue will affect any part of zkpython where a call to the C client is blocked on
some work being completed in another Python thread - in practice, I think this means from
callbacks. I'll audit the code to see if any other API calls are affected. Patch to fix this
issue is following shortly - Kapil, I'd be very grateful if you could help us by testing it.


cheers,
Henry

> Deadlock on close w/ zkpython / c client
> ----------------------------------------
>
>                 Key: ZOOKEEPER-763
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-763
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client, contrib-bindings
>    Affects Versions: 3.3.0
>         Environment: ubuntu 10.04, zookeeper 3.3.0 and trunk
>            Reporter: Kapil Thangavelu
>            Assignee: Mahadev konar
>             Fix For: 3.4.0
>
>         Attachments: deadlock.py, deadlock_v2.py, stack-trace-deadlock.txt
>
>
> deadlocks occur if we attempt to close a handle while there are any outstanding async
requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion
thread are terminated and joined, however w\ith outstanding async requests, the completion
thread won't be in a joinable state, and we effectively hang when the main thread does the
join.
> afaics ideal behavior would be on close of a handle, to effectively clear out any remaining
callbacks and let the completion thread terminate.
> i've tried adding some bookkeeping to within a python client to guard against closing
while there is an outstanding async completion request, but its an imperfect solution since
even after the python callback is executed there is still a window for deadlock before the
completion thread finishes the callback.
> a simple example to reproduce the deadlock is attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message