curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Orcun Simsek (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CURATOR-79) InterProcessMutex doesn't clean up after interrupt
Date Wed, 27 Nov 2013 04:35:35 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833444#comment-13833444
] 

Orcun Simsek commented on CURATOR-79:
-------------------------------------

We've run into this a few times in production, and our live workaround is to kill the offending
ZK session. We're currently looking to suppress the cause of the interrupts, but are concerned
that 1) we may not own all sources of the interruption 2) that this deadlock can occur with
any non-KeeperException, not just an InterruptedException. A fix will be much appreciated,
and we'll try to put up a working patch as soon as possible.

> InterProcessMutex doesn't clean up after interrupt
> --------------------------------------------------
>
>                 Key: CURATOR-79
>                 URL: https://issues.apache.org/jira/browse/CURATOR-79
>             Project: Apache Curator
>          Issue Type: Bug
>            Reporter: Orcun Simsek
>            Assignee: Jordan Zimmerman
>
> InterProcessMutex can deadlock if a thread is interrupted during acquire().  Specifically,
CreateBuilderImpl.pathInForeground submits a create request to ZooKeeper, and an InterruptedException
is thrown after the node is created in ZK but before ZK.create returns. ZK.create propagates
a non-KeeperException, so Curator assumes the create has failed, but does not retry, and the
node is now orphaned. At some point in the future, the node becomes the next in the acquisition
sequence, but is not reclaimed as the ZK session has not expired.
> <stack trace attached in comments below>
> Curator should catch the InterruptedException and other non-KeeperExceptions, and delete
the created node before propagating these exceptions.
> (as originally discussed on https://groups.google.com/forum/#!topic/curator-users/9ii5of8SbdQ)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message