From notifications-return-21268-apmail-accumulo-notifications-archive=accumulo.apache.org@accumulo.apache.org Fri May 16 17:32:20 2014 Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AA4511FAA for ; Fri, 16 May 2014 17:32:20 +0000 (UTC) Received: (qmail 37663 invoked by uid 500); 16 May 2014 11:11:35 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 8521 invoked by uid 500); 16 May 2014 11:01:46 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 85637 invoked by uid 99); 16 May 2014 10:53:40 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 May 2014 10:53:40 +0000 Date: Fri, 16 May 2014 10:53:40 +0000 (UTC) From: "ASF subversion and git services (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-2764) Stopping MAC before it's processes have fully started causes an indefinite hang MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999063#comment-13999063 ] ASF subversion and git services commented on ACCUMULO-2764: ----------------------------------------------------------- Commit 57f27635b0414ae3198995f932ccac2501eb73cd in accumulo's branch refs/heads/1.6.1-SNAPSHOT from [~elserj] [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=57f2763 ] ACCUMULO-2764 Wrap the MAC process termination in a Callable to get timeout semantics > Stopping MAC before it's processes have fully started causes an indefinite hang > ------------------------------------------------------------------------------- > > Key: ACCUMULO-2764 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2764 > Project: Accumulo > Issue Type: Bug > Components: mini > Affects Versions: 1.5.1, 1.6.0 > Environment: OpenJDK 1.6.0, CentOS 6.5, 2CPU, 6GB RAM (virtual hardware) > Reporter: Christopher Tubbs > Assignee: Josh Elser > Fix For: 1.5.2, 1.6.1, 1.7.0 > > > I saw this testing 1.6.0-RC5. > Calling process.destroy() and then process.waitFor(), as MiniAccumuloCluster does in it's stop method, before the process is fully started, appears to create an indefinite hang. > I saw this most recently in MiniAccumuloClusterGCTest.testAccurateProcessListReturned, which gets a ProcessReference and then immediately shuts down MAC, though it was also the root cause of ACCUMULO-2756. In this instance, the test got stuck in the MAC teardown. > {code:java} > "main" prio=10 tid=0x00007f3cf4008800 nid=0x2b19 in Object.wait() [0x00007f3cf8f9c000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000000e29dd2e8> (a java.lang.UNIXProcess) > at java.lang.Object.wait(Object.java:502) > at java.lang.UNIXProcess.waitFor(UNIXProcess.java:181) > - locked <0x00000000e29dd2e8> (a java.lang.UNIXProcess) > at org.apache.accumulo.minicluster.impl.MiniAccumuloClusterImpl.stop(MiniAccumuloClusterImpl.java:607) > at org.apache.accumulo.minicluster.impl.MiniAccumuloClusterGCTest.tearDownMiniCluster(MiniAccumuloClusterGCTest.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:622) > at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) > at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > {code} > It appears that destroy() doesn't actually succeed in destroying a process which is still starting, so the waitFor() waits indefinitely. I haven't debugged further. It may be a JVM bug, or a limitation in the java Process API, or some UNIX signal handling quirk with process instantiation that destroy() cannot know. > One fix could be to make start() wait until the metadata table can be scanned before it returns, to ensure all processes are actually running and ready. Another fix would be to have the teardown code try another destroy if waitFor() doesn't return after a reasonable amount of time. -- This message was sent by Atlassian JIRA (v6.2#6252)