curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cameron McKenzie (JIRA)" <>
Subject [jira] [Commented] (CURATOR-79) InterProcessMutex doesn't clean up after interrupt
Date Fri, 08 Aug 2014 05:56:11 GMT


Cameron McKenzie commented on CURATOR-79:

So, a fix for this is more complicated than I was hoping.

The recipes already use protected ephemeral nodes, so for the normal case where you attempt
to create a node and lose the connection after you've submitted the create request but before
you've got a response is covered. The problem is that this only handles the ConnectionLoss
exception. If any other type of exception occurs then the logic to remove the potentially
created ephemeral node does not fire.

It is possible, but a bit messy to handle this at the LockInternals level. In the case of
getting an exception while trying to create the zNode, we can try and remove the potentially
created node, but we don't know its name. So, we'd need to query all the children of the parent
lock path, and then work out which ones are ephemeral nodes owned by the current session,
and aren't known about by the current lock instance (i.e. they are an orphan). I've implemented
this, but it's a bit messy and requires changes to the clients of LockInternals.

So, I think this needs some more thought. Perhaps the logic in the protected node handling
can be extended to fire on any non KeeperException (other than ConnectionLoss). Any thoughts
from anyone else?

> InterProcessMutex doesn't clean up after interrupt
> --------------------------------------------------
>                 Key: CURATOR-79
>                 URL:
>             Project: Apache Curator
>          Issue Type: Bug
>    Affects Versions: 2.0.0-incubating, 2.1.0-incubating, 2.2.0-incubating, 2.3.0
>            Reporter: Orcun Simsek
>            Assignee: Jordan Zimmerman
> InterProcessMutex can deadlock if a thread is interrupted during acquire().  Specifically,
CreateBuilderImpl.pathInForeground submits a create request to ZooKeeper, and an InterruptedException
is thrown after the node is created in ZK but before ZK.create returns. ZK.create propagates
a non-KeeperException, so Curator assumes the create has failed, but does not retry, and the
node is now orphaned. At some point in the future, the node becomes the next in the acquisition
sequence, but is not reclaimed as the ZK session has not expired.
> <stack trace attached in comments below>
> Curator should catch the InterruptedException and other non-KeeperExceptions, and delete
the created node before propagating these exceptions.
> (as originally discussed on!topic/curator-users/9ii5of8SbdQ)

This message was sent by Atlassian JIRA

View raw message