curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Orcun Simsek (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CURATOR-79) InterProcessMutex doesn't clean up after interrupt
Date Wed, 27 Nov 2013 04:33:36 GMT

     [ https://issues.apache.org/jira/browse/CURATOR-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Orcun Simsek updated CURATOR-79:
--------------------------------

    Description: 
InterProcessMutex can deadlock if a thread is interrupted during acquire().  Specifically,
CreateBuilderImpl.pathInForeground submits a create request to ZooKeeper, and an InterruptedException
is thrown after the node is created in ZK but before ZK.create returns. ZK.create propagates
a non-KeeperException, so Curator assumes the create has failed, but does not retry, and the
node is now orphaned. At some point in the future, the node becomes the next in the acquisition
sequence, but is not reclaimed as the ZK session has not expired.

<stack trace attached in comments below>

Curator should catch the InterruptedException and other non-KeeperExceptions, and delete the
created node before propagating these exceptions.

(as originally discussed on https://groups.google.com/forum/#!topic/curator-users/9ii5of8SbdQ)


  was:
InterProcessMutex can deadlock if a thread is interrupted during acquire().  Specifically,
CreateBuilderImpl.pathInForeground submits a create request to ZooKeeper, and an InterruptedException
is thrown after the node is created in ZK but before ZK.create returns. ZK.create propagates
a non-KeeperException, so Curator assumes the create has failed, but does not retry, and the
node is now orphaned. At some point in the future, the node becomes the next in the acquisition
sequence, but is not reclaimed as the ZK session has not expired.

{code}
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:503)
	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:781)
	at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:625)
	at com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:609)
	at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106)
	at com.netflix.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:605)
	at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:428)
	at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:408)
	at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:41)
	at com.netflix.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:222)
	at com.netflix.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:218)
	at com.netflix.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:74)
	at com.palantir.finance.server.service.storage.CuratorLockTests.testInterruptDeadlock(CuratorLockTests.java:50)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
{code}

Curator should catch the InterruptedException and other non-KeeperExceptions, and delete the
created node before propagating these exceptions.

(as originally discussed on https://groups.google.com/forum/#!topic/curator-users/9ii5of8SbdQ)



> InterProcessMutex doesn't clean up after interrupt
> --------------------------------------------------
>
>                 Key: CURATOR-79
>                 URL: https://issues.apache.org/jira/browse/CURATOR-79
>             Project: Apache Curator
>          Issue Type: Bug
>            Reporter: Orcun Simsek
>            Assignee: Jordan Zimmerman
>
> InterProcessMutex can deadlock if a thread is interrupted during acquire().  Specifically,
CreateBuilderImpl.pathInForeground submits a create request to ZooKeeper, and an InterruptedException
is thrown after the node is created in ZK but before ZK.create returns. ZK.create propagates
a non-KeeperException, so Curator assumes the create has failed, but does not retry, and the
node is now orphaned. At some point in the future, the node becomes the next in the acquisition
sequence, but is not reclaimed as the ZK session has not expired.
> <stack trace attached in comments below>
> Curator should catch the InterruptedException and other non-KeeperExceptions, and delete
the created node before propagating these exceptions.
> (as originally discussed on https://groups.google.com/forum/#!topic/curator-users/9ii5of8SbdQ)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message