hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakesh R <rake...@huawei.com>
Subject DFS#close is waiting long time for AckedSeqno response..
Date Wed, 26 Nov 2014 05:54:55 GMT

I could see the dfsclient#close is getting into timed_wait for long time and is not come out.
When analyzing the issue, I could see that the dfsclient fails to communicate with the datanode.
The reason for the failure is my KDC server is down for some time. The other side the dfsclient
is infinitely waiting for the ackSeqNo.

I feel, rather than entering into an infinite waiting it could wait for configurable amount
of time and close the client. What others opinion?

      synchronized (dataQueue) {
        while (!closed) {
          if (lastAckedSeqno >= seqno) {
          try {
            dataQueue.wait(1000); // when we receive an ack, we notify on
                                  // dataQueue
          } catch (InterruptedException ie) {
            throw new InterruptedIOException(
                "Interrupted while waiting for data to be acknowledged by pipeline");

"pool-8-thread-1" prio=10 tid=0x0000000000b66800 nid=0x59bc in Object.wait() [0x00007fde12b74000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
                at java.lang.Object.wait(Native Method)
                at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2034)
                - locked <0x00000006386da3b0> (a java.util.LinkedList)
                at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:2019)
                at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2111)
                - locked <0x00000006384b37b0> (a org.apache.hadoop.hdfs.DFSOutputStream)
                at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:856)
                at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:873)
                - locked <0x00000006382fce40> (a org.apache.hadoop.hdfs.DFSClient)
                at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:857)
                at com.huawei.dpa.hdfs.utils.AbstractConn.close(AbstractConn.java:67)
                at com.huawei.dpa.hdfs.utils.HdfsConnPool.updateConnPool(HdfsConnPool.java:149)
                - locked <0x0000000638515288> (a com.huawei.dpa.hdfs.utils.HdfsConnPool)
                at com.huawei.dpa.hdfs.utils.HdfsConnPool.access$100(HdfsConnPool.java:14)
                at com.huawei.dpa.hdfs.utils.HdfsConnPool$1.run(HdfsConnPool.java:137)

Anyone phase similar kind of issues. Appreciate any help. Thanks!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message