hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Xu <xu...@neusoft.com>
Subject [HDFS] DFSClient does not closing a closed socket resulting in thousand of CLOSE_WAIT sockets with HDP 2.1/HBase 0.98.0/Hadoop/2.4.0
Date Fri, 29 Aug 2014 09:26:07 GMT
Hello Hadoopers,

When I run HDP 2.1/HBase 0.98.0/Hadoop/2.4.0, I always got the fatal
problem: DFSClient does not closing a closed socket resulting in thousand of
CLOSE_WAIT sockets. Have you guys got same issue, if that please share to
me? Thanks a lot. I also create a issue HDFS-6973 for this.


HBase as HDFS Client dose not close a dead connection with the datanode.
This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not
connect to the datanode because too many mapped sockets from one host to
another on the same port:50010. 
After I restart all RSs, the count of CLOSE_WAIT will increase always.
$ netstat -an|grep CLOSE_WAIT|wc -l
netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l
ps -ef|grep 6569
hbase 6569 6556 21 Aug25 ? 09:52:33 /opt/jdk1.6.0_25/bin/java
-Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
I aslo have reviewed these issues:
HDFS-5697 <https://issues.apache.org/jira/browse/HDFS-5697> 
HDFS-5671 <https://issues.apache.org/jira/browse/HDFS-5671> 
HDFS-1836 <https://issues.apache.org/jira/browse/HDFS-1836> 
 <https://issues.apache.org/jira/browse/HBASE-9393> HBASE-9393
I found in HBase 0.98/Hadoop 2.4.0 source codes of these patchs have been
But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue.
Please check. Thanks a lot.
These codes have been added into
BlockReaderFactory.getRemoteBlockReaderFromTcp(). Another bug maybe lead my



// Some comments here

  private BlockReader getRemoteBlockReaderFromTcp() throws IOException {

    if (LOG.isTraceEnabled()) {

      LOG.trace(this + ": trying to create a remote block reader from a " +

          "TCP socket");


    BlockReader blockReader = null;

    while (true) {

      BlockReaderPeer curPeer = null;

      Peer peer = null;

      try {

        curPeer = nextTcpPeer();

        if (curPeer == null) break;

        if (curPeer.fromCache) remainingCacheTries--;

        peer = curPeer.peer;

        blockReader = getRemoteBlockReader(peer);

        return blockReader;

      } catch (IOException ioe) {

        if (isSecurityException(ioe)) {

          if (LOG.isTraceEnabled()) {

            LOG.trace(this + ": got security exception while constructing "

                "a remote block reader from " + peer, ioe);


          throw ioe;


        if ((curPeer != null) && curPeer.fromCache) {

          // Handle an I/O error we got when using a cached peer.  These are

          // considered less serious, because the underlying socket may be

          // stale.

          if (LOG.isDebugEnabled()) {

            LOG.debug("Closed potentially stale remote peer " + peer, ioe);


        } else {

          // Handle an I/O error we got when using a newly created peer.

          LOG.warn("I/O error constructing remote block reader.", ioe);

          throw ioe;


      } finally {

        if (blockReader == null) {

          IOUtils.cleanup(LOG, peer);




    return null;



Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)

is intended only for the use of the intended recipient and may be confidential and/or privileged
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or
is strictly prohibited, and may be unlawful.If you have received this communication in error,please

immediately notify the sender by return e-mail, and delete the original message and all copies
your system. Thank you. 

View raw message