Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D917111643 for ; Fri, 16 May 2014 22:51:48 +0000 (UTC) Received: (qmail 86289 invoked by uid 500); 16 May 2014 10:53:46 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 62207 invoked by uid 500); 16 May 2014 10:38:41 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 55939 invoked by uid 99); 16 May 2014 10:30:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 May 2014 10:30:05 +0000 Date: Fri, 16 May 2014 10:30:05 +0000 (UTC) From: "Josh Elser (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-2764) Stopping MAC before it's processes have fully started causes an indefinite hang MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998846#comment-13998846 ] Josh Elser commented on ACCUMULO-2764: -------------------------------------- We do the same destroy() && waitFor() in 1.5 too. > Stopping MAC before it's processes have fully started causes an indefinite hang > ------------------------------------------------------------------------------- > > Key: ACCUMULO-2764 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2764 > Project: Accumulo > Issue Type: Bug > Components: mini > Affects Versions: 1.5.1, 1.6.0 > Environment: OpenJDK 1.6.0, CentOS 6.5, 2CPU, 6GB RAM (virtual hardware) > Reporter: Christopher Tubbs > Fix For: 1.5.2, 1.6.1, 1.7.0 > > > I saw this testing 1.6.0-RC5. > Calling process.destroy() and then process.waitFor(), as MiniAccumuloCluster does in it's stop method, before the process is fully started, appears to create an indefinite hang. > I saw this most recently in MiniAccumuloClusterGCTest.testAccurateProcessListReturned, which gets a ProcessReference and then immediately shuts down MAC, though it was also the root cause of ACCUMULO-2756. In this instance, the test got stuck in the MAC teardown. > {code:java} > "main" prio=10 tid=0x00007f3cf4008800 nid=0x2b19 in Object.wait() [0x00007f3cf8f9c000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000000e29dd2e8> (a java.lang.UNIXProcess) > at java.lang.Object.wait(Object.java:502) > at java.lang.UNIXProcess.waitFor(UNIXProcess.java:181) > - locked <0x00000000e29dd2e8> (a java.lang.UNIXProcess) > at org.apache.accumulo.minicluster.impl.MiniAccumuloClusterImpl.stop(MiniAccumuloClusterImpl.java:607) > at org.apache.accumulo.minicluster.impl.MiniAccumuloClusterGCTest.tearDownMiniCluster(MiniAccumuloClusterGCTest.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:622) > at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) > at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > {code} > It appears that destroy() doesn't actually succeed in destroying a process which is still starting, so the waitFor() waits indefinitely. I haven't debugged further. It may be a JVM bug, or a limitation in the java Process API, or some UNIX signal handling quirk with process instantiation that destroy() cannot know. > One fix could be to make start() wait until the metadata table can be scanned before it returns, to ensure all processes are actually running and ready. Another fix would be to have the teardown code try another destroy if waitFor() doesn't return after a reasonable amount of time. -- This message was sent by Atlassian JIRA (v6.2#6252)