hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11333) Fix deadlock in DomainSocketWatcher when the notification pipe is full
Date Tue, 02 Dec 2014 11:43:19 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231371#comment-14231371
] 

Hudson commented on HADOOP-11333:
---------------------------------

ABORTED: Integrated in Hadoop-Hdfs-trunk #1951 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1951/])
HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification pipe is full (zhaoyunjiong
via cmccabe) (cmccabe: rev 86e3993def01223f92b8d1dd35f6c1f8ab6033f5)
* hadoop-common-project/hadoop-common/CHANGES.txt
* hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java


> Fix deadlock in DomainSocketWatcher when the notification pipe is full
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-11333
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11333
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: zhaoyunjiong
>            Assignee: zhaoyunjiong
>             Fix For: 2.7.0
>
>         Attachments: 11241021, 11241023, 11241025, HADOOP-11333-1.patch, HADOOP-11333.patch
>
>
> I found some of our DataNodes will run "exceeds the limit of concurrent xciever", the
limit is 4K.
> After check the stack, I suspect that org.apache.hadoop.net.unix.DomainSocket.writeArray0
which called by DomainSocketWatcher.kick stuck:
> {quote}
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1]" daemon
prio=10 tid=0x00007f55c5576000 nid=0x385d waiting on condition [0x00007f558d5d4000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x0000000740df9c90> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
>         at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
>         at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
>         at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286)
>         at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
> --
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1]" daemon
prio=10 tid=0x00007f7de034c800 nid=0x7b7 runnable [0x00007f7db06c5000]
>    java.lang.Thread.State: RUNNABLE
> 	at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> 	at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> 	at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
> 	at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
> 	at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
> 	at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
> 	at java.lang.Thread.run(Thread.java:745)
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1]" daemon
prio=10 tid=0x00007f55c5574000 nid=0x377a waiting on condition [0x00007f558d7d6000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x0000000740df9cb0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>         at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306)
>         at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
>         at java.lang.Thread.run(Thread.java:745)
>              
> "Thread-163852" daemon prio=10 tid=0x00007f55c811c800 nid=0x6757 runnable [0x00007f55aef6e000]
>    java.lang.Thread.State: RUNNABLE 
>         at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>         at org.apache.hadoop.net.unix.DomainSocketWatcher.access$800(DomainSocketWatcher.java:52)
>         at org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:457)
>         at java.lang.Thread.run(Thread.java:745)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message