hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9393) Hbase dose not closing a closed socket resulting in many CLOSE_WAIT
Date Fri, 11 Oct 2013 18:26:43 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792915#comment-13792915

Colin Patrick McCabe commented on HBASE-9393:

I looked into this issue.  I found a few things:

The HDFS socket cache is too small by default and times out too quickly.  Its default size
is 16, but HBase seems to be opening many more connections to the DN than that.  In this situation,
sockets must inevitably be opened and then discarded, leading to sockets in {{CLOSE_WAIT}}.

When you use positional read (aka {{pread}}), we grab a socket from the cache, read from it,
and then immediately put it back.  When you seek and then call {{read}}, we don't put the
socket back at the end.  The assumption behind the normal {{read}} method is that  you are
probably going to call {{read}} again, so it holds on to the socket until something else comes
up (such as closing the stream).  In many scenarios, this can lead to {{seek+read}} generating
more sockets in {{CLOSE_WAIT}} than {{pread}}.

I don't think we want to alter this HDFS behavior, since it's helpful in the case that you're
reading through the entire file from start to finish-- which many HDFS clients do.  It allows
us to make certain optimizations such as reading a few kilobytes at a time, even if the user
only asks for a few bytes at a time.  These optimizations are unavailable with {{pread}} because
it creates a new {{BlockReader}} each time.

So as far as recommendations for HBase go:
* use short-circuit reads whenever possible, since in many cases you can avoid needing a socket
at all and just reuse the same file descriptor
* set the socket cache to a bigger size and adjust the timeouts to be longer (I may explore
changing the defaults in HDFS...)
* if you are going to keep files open for a while and random read, use {{pread}}, never {{seek+read}}.

> Hbase dose not closing a closed socket resulting in many CLOSE_WAIT 
> --------------------------------------------------------------------
>                 Key: HBASE-9393
>                 URL: https://issues.apache.org/jira/browse/HBASE-9393
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.2
>         Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, 7279 regions
>            Reporter: Avi Zrachya
> HBase dose not close a dead connection with the datanode.
> This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect to the
datanode because too many mapped sockets from one host to another on the same port.
> The example below is with low CLOSE_WAIT count because we had to restart hbase to solve
the porblem, later in time it will incease to 60-100K sockets on CLOSE_WAIT
> [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l
> 13156
> [root@hd2-region3 ~]# ps -ef |grep 21592
> root     17255 17219  0 12:26 pts/0    00:00:00 grep 21592
> hbase    21592     1 17 Aug29 ?        03:29:06 /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill
-9 %p -Xmx8000m -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dhbase.log.dir=/var/log/hbase
-Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ...

This message was sent by Atlassian JIRA

View raw message