From hdfs-issues-return-23142-apmail-hadoop-hdfs-issues-archive=hadoop.apache.org@hadoop.apache.org Tue Aug 2 15:37:51 2011 Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 59BFE6E9E for ; Tue, 2 Aug 2011 15:37:51 +0000 (UTC) Received: (qmail 14526 invoked by uid 500); 2 Aug 2011 15:37:50 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 13901 invoked by uid 500); 2 Aug 2011 15:37:50 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 13602 invoked by uid 99); 2 Aug 2011 15:37:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Aug 2011 15:37:49 +0000 X-ASF-Spam-Status: No, hits=-2000.7 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Aug 2011 15:37:48 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 30B839A479 for ; Tue, 2 Aug 2011 15:37:27 +0000 (UTC) Date: Tue, 2 Aug 2011 15:37:27 +0000 (UTC) From: "Steve Loughran (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <609909673.1794.1312299447195.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1976214481.14291.1311812229644.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-2213) DataNode gets stuck while shutting down minicluster MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076272#comment-13076272 ] Steve Loughran commented on HDFS-2213: -------------------------------------- Looking at this a bit more, there are two threads in the jetty pool still live One is waiting for input, most interestingly at AbstractConnector.java:707 the loop discards all interrupted exceptions, just repeats the loop checking to see if its been told to stop. the state variable it checks is volatile, but you'd have to see where the connector is actually stopped -as the thread pool doesn't do it. "728981380@qtp4-1 - Acceptor0 SelectChannelConnector@localhost.localdomain:45424" prio=10 tid=0x00007f18e8996000 nid=0x6b65 runnable [0x00007f18ecc92000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x00000000f27d7908> (a sun.nio.ch.Util$2) - locked <0x00000000f27d78f8> (a java.util.Collections$UnmodifiableSet) - locked <0x00000000f27d7460> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429) at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185) at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) The other one is the pool manager itself; a thread that decides whether or not to add and remove threads. "115556431@qtp4-0" prio=10 tid=0x00007f18e8542000 nid=0x6b64 in Object.wait() [0x00007f18ece94000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000f27d69d8> (a org.mortbay.thread.QueuedThreadPool$PoolThread) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:565) - locked <0x00000000f27d69d8> (a org.mortbay.thread.QueuedThreadPool$PoolThread) Again, interrupts cause it to check its running state, but don't stop the thread itself. At a guess then, I'd say that jetty isn't being shut down properly, with all its lifecycle bits not being stopped first. I've not seen this before in my own code > DataNode gets stuck while shutting down minicluster > --------------------------------------------------- > > Key: HDFS-2213 > URL: https://issues.apache.org/jira/browse/HDFS-2213 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 0.23.0 > Reporter: Todd Lipcon > Fix For: 0.23.0 > > Attachments: stack.txt > > > I've seen a couple times where a unit test has timed out. jstacking shows the cluster is stuck trying to shut down one of the DataNode HTTP servers. The DataNodeBlockScanner thread also seems to be in a tight loop in its main loop. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira