hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhaoyunjiong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7429) DomainSocketWatcher.kick stuck
Date Tue, 25 Nov 2014 10:11:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224325#comment-14224325
] 

zhaoyunjiong commented on HDFS-7429:
------------------------------------

The problem here is in our machine we can only send 299 bytes to domain socket.
When it try to send the 300 byte, it will block, and the DomainSocketWatcher.add(DomainSocket
sock, Handler handler)  have the lock, so watcherThread.run can't get the lock and clear the
buffer, it's a live lock.

I'm not sure which configuration controls the bufferSize of 299 for now.
Now I suspect net.core.netdev_budget, which is 300 at our machines.
I'll upload a patch to control the send bytes to prevent live lock later.

By the way, should I move this to HADOOP COMMON project?

> DomainSocketWatcher.kick stuck
> ------------------------------
>
>                 Key: HDFS-7429
>                 URL: https://issues.apache.org/jira/browse/HDFS-7429
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: zhaoyunjiong
>         Attachments: 11241021, 11241023, 11241025
>
>
> I found some of our DataNodes will run "exceeds the limit of concurrent xciever", the
limit is 4K.
> After check the stack, I suspect that org.apache.hadoop.net.unix.DomainSocket.writeArray0
which called by DomainSocketWatcher.kick stuck:
> {quote}
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1]" daemon
prio=10 tid=0x00007f55c5576000 nid=0x385d waiting on condition [0x00007f558d5d4000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x0000000740df9c90> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
>         at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
>         at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
>         at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:286)
>         at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
> --
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1]" daemon
prio=10 tid=0x00007f7de034c800 nid=0x7b7 runnable [0x00007f7db06c5000]
>    java.lang.Thread.State: RUNNABLE
> 	at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> 	at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> 	at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:589)
> 	at org.apache.hadoop.net.unix.DomainSocketWatcher.kick(DomainSocketWatcher.java:350)
> 	at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:303)
> 	at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
> 	at java.lang.Thread.run(Thread.java:745)
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn [Waiting for operation #1]" daemon
prio=10 tid=0x00007f55c5574000 nid=0x377a waiting on condition [0x00007f558d7d6000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x0000000740df9cb0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>         at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:306)
>         at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:283)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:413)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:172)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:92)
>         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
>         at java.lang.Thread.run(Thread.java:745)
>              
> "Thread-163852" daemon prio=10 tid=0x00007f55c811c800 nid=0x6757 runnable [0x00007f55aef6e000]
>    java.lang.Thread.State: RUNNABLE 
>         at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>         at org.apache.hadoop.net.unix.DomainSocketWatcher.access$800(DomainSocketWatcher.java:52)
>         at org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:457)
>         at java.lang.Thread.run(Thread.java:745)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message