hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2213) DataNode gets stuck while shutting down minicluster
Date Tue, 02 Aug 2011 15:37:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076272#comment-13076272

Steve Loughran commented on HDFS-2213:

Looking at this a bit more, there are two threads in the jetty pool still live

One is waiting for input, most interestingly at AbstractConnector.java:707 the loop discards
all interrupted exceptions, just repeats the loop checking to see if its been told to stop.
the state variable it checks is volatile, but you'd have to see where the connector is actually
stopped -as the thread pool doesn't do it. 

"728981380@qtp4-1 - Acceptor0 SelectChannelConnector@localhost.localdomain:45424" prio=10
tid=0x00007f18e8996000 nid=0x6b65 runnable [0x00007f18ecc92000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
	- locked <0x00000000f27d7908> (a sun.nio.ch.Util$2)
	- locked <0x00000000f27d78f8> (a java.util.Collections$UnmodifiableSet)
	- locked <0x00000000f27d7460> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
	at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429)
	at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185)
	at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
	at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

The other one is the pool manager itself; a thread that decides whether or not to add and
remove threads. 

"115556431@qtp4-0" prio=10 tid=0x00007f18e8542000 nid=0x6b64 in Object.wait() [0x00007f18ece94000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000f27d69d8> (a org.mortbay.thread.QueuedThreadPool$PoolThread)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:565)
	- locked <0x00000000f27d69d8> (a org.mortbay.thread.QueuedThreadPool$PoolThread)

Again, interrupts cause it to check its running state, but don't stop the thread itself.

At a guess then, I'd say that jetty isn't being shut down properly, with all its lifecycle
bits not being stopped first. I've not seen this before in my own code

> DataNode gets stuck while shutting down minicluster
> ---------------------------------------------------
>                 Key: HDFS-2213
>                 URL: https://issues.apache.org/jira/browse/HDFS-2213
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>             Fix For: 0.23.0
>         Attachments: stack.txt
> I've seen a couple times where a unit test has timed out. jstacking shows the cluster
is stuck trying to shut down one of the DataNode HTTP servers. The DataNodeBlockScanner thread
also seems to be in a tight loop in its main loop.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message