accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2764) Stopping MAC before it's processes have fully started causes an indefinite hang
Date Fri, 16 May 2014 10:53:40 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999064#comment-13999064
] 

ASF subversion and git services commented on ACCUMULO-2764:
-----------------------------------------------------------

Commit 57f27635b0414ae3198995f932ccac2501eb73cd in accumulo's branch refs/heads/master from
[~elserj]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57f2763 ]

ACCUMULO-2764 Wrap the MAC process termination in a Callable to get timeout semantics


> Stopping MAC before it's processes have fully started causes an indefinite hang
> -------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-2764
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2764
>             Project: Accumulo
>          Issue Type: Bug
>          Components: mini
>    Affects Versions: 1.5.1, 1.6.0
>         Environment: OpenJDK 1.6.0, CentOS 6.5, 2CPU, 6GB RAM (virtual hardware)
>            Reporter: Christopher Tubbs
>            Assignee: Josh Elser
>             Fix For: 1.5.2, 1.6.1, 1.7.0
>
>
> I saw this testing 1.6.0-RC5.
> Calling process.destroy() and then process.waitFor(), as MiniAccumuloCluster does in
it's stop method, before the process is fully started, appears to create an indefinite hang.
> I saw this most recently in MiniAccumuloClusterGCTest.testAccurateProcessListReturned,
which gets a ProcessReference and then immediately shuts down MAC, though it was also the
root cause of ACCUMULO-2756. In this instance, the test got stuck in the MAC teardown.
> {code:java}
> "main" prio=10 tid=0x00007f3cf4008800 nid=0x2b19 in Object.wait() [0x00007f3cf8f9c000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00000000e29dd2e8> (a java.lang.UNIXProcess)
>         at java.lang.Object.wait(Object.java:502)
>         at java.lang.UNIXProcess.waitFor(UNIXProcess.java:181)
>         - locked <0x00000000e29dd2e8> (a java.lang.UNIXProcess)
>         at org.apache.accumulo.minicluster.impl.MiniAccumuloClusterImpl.stop(MiniAccumuloClusterImpl.java:607)
>         at org.apache.accumulo.minicluster.impl.MiniAccumuloClusterGCTest.tearDownMiniCluster(MiniAccumuloClusterGCTest.java:74)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:622)
>         at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>         at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>         at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
>         at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>         at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
>         at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>         at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>         at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>         at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>         at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}
> It appears that destroy() doesn't actually succeed in destroying a process which is still
starting, so the waitFor() waits indefinitely. I haven't debugged further. It may be a JVM
bug, or a limitation in the java Process API, or some UNIX signal handling quirk with process
instantiation that destroy() cannot know.
> One fix could be to make start() wait until the metadata table can be scanned before
it returns, to ensure all processes are actually running and ready. Another fix would be to
have the teardown code try another destroy if waitFor() doesn't return after a reasonable
amount of time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message