hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "steven xu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-6973) DFSClient does not closing a closed socket resulting in thousand of CLOSE_WAIT sockets
Date Fri, 29 Aug 2014 08:59:53 GMT
steven xu created HDFS-6973:

             Summary: DFSClient does not closing a closed socket resulting in thousand of
CLOSE_WAIT sockets
                 Key: HDFS-6973
                 URL: https://issues.apache.org/jira/browse/HDFS-6973
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 2.4.0
         Environment: RHEL 6.3 -HDP 2.1 -6 RegionServers/Datanode -18T per node -3108Regions
            Reporter: steven xu

HBase as HDFS Client dose not close a dead connection with the datanode.
This resulting in over 30K+ CLOSE_WAIT and at some point HBase can not connect to the datanode
because too many mapped sockets from one host to another on the same port:50010. 
After I restart all RSs, the count of CLOSE_WAIT will increase always.
$ netstat -an|grep CLOSE_WAIT|wc -l
netstat -nap|grep CLOSE_WAIT|grep 6569|wc -l
ps -ef|grep 6569
hbase 6569 6556 21 Aug25 ? 09:52:33 /opt/jdk1.6.0_25/bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill
-9 %p -Xmx1000m -XX:+UseConcMarkSweepGC
I aslo have reviewed these issues:
I found in HBase 0.98/Hadoop 2.4.0 source codes of these patchs have been added.
But I donot understand why HBase 0.98/Hadoop 2.4.0 also have this isssue. Please check. Thanks
a lot.
These codes have been added into BlockReaderFactory.getRemoteBlockReaderFromTcp(). Another
bug maybe lead my problem,
// Some comments here
  private BlockReader getRemoteBlockReaderFromTcp() throws IOException {
    if (LOG.isTraceEnabled()) {
      LOG.trace(this + ": trying to create a remote block reader from a " +
          "TCP socket");
    BlockReader blockReader = null;
    while (true) {
      BlockReaderPeer curPeer = null;
      Peer peer = null;
      try {
        curPeer = nextTcpPeer();
        if (curPeer == null) break;
        if (curPeer.fromCache) remainingCacheTries--;
        peer = curPeer.peer;
        blockReader = getRemoteBlockReader(peer);
        return blockReader;
      } catch (IOException ioe) {
        if (isSecurityException(ioe)) {
          if (LOG.isTraceEnabled()) {
            LOG.trace(this + ": got security exception while constructing " +
                "a remote block reader from " + peer, ioe);
          throw ioe;
        if ((curPeer != null) && curPeer.fromCache) {
          // Handle an I/O error we got when using a cached peer.  These are
          // considered less serious, because the underlying socket may be
          // stale.
          if (LOG.isDebugEnabled()) {
            LOG.debug("Closed potentially stale remote peer " + peer, ioe);
        } else {
          // Handle an I/O error we got when using a newly created peer.
          LOG.warn("I/O error constructing remote block reader.", ioe);
          throw ioe;
      } finally {
        if (blockReader == null) {
          IOUtils.cleanup(LOG, peer);
    return null;

This message was sent by Atlassian JIRA

View raw message