hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT
Date Mon, 25 Jan 2016 19:20:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115799#comment-15115799

stack commented on HBASE-9393:

bq. I believe there is an option do to #1 even right now. Can't HBase be configured just to
use pread and never read?

We want sequential reading when doing long scans (the purported hdfs i/o 'pipeliniing'). We
want to be able to pick and choose dependent on read-type (short scan or random get vs streaming

This issue and suggestion offlist by [~Apache9] brings up the unfinished project, https://issues.apache.org/jira/browse/HBASE-5979,
which is the proper way to fix what is going on in here (as well as doing proper separation
of long vs short read). Would be good to revive. There is good stuff in the cited issue.

Adding the below as finally in a method named pickReaderVersion seems a bit odd... is pickReaderVersion
only place we read in the file trailer? That seems odd (not your issue [~ashish singhi]).
You'd think we'd want to keep the trailer around in the reader.

522	    } finally {
523	      unbufferStream(fsdis);
524	    }
525	  }

On commit, lets point this issue as to why we are doing gymnastics in unbufferStream method...
and why the reflection.

Is it odd adding this unbufferStream to hbase types when there is the Interface CanUnbuffer
up in hdfs? Should we have a local hbase equivalent... and put it on HFileBlock, HFileReader...
Then the relation is more clear? Perhaps overkill?

Why you think the sequentialRead numbers are so different in your perf test above [~ashish
singhi]? The extra setup after reading  in the trailer?

bq. TestStochasticLoadBalancer failure was not related to the change - it has failed intermittently.
[~yuzhihong@gmail.com] Let me retry the patch. We need clean build to commit... for any patch.
No more, '... it passes for me locally...'. It has to pass up here on apache. If we can't
get it to pass, nothing should get checked in until tests are fixed. Otherwise our test suite
is for nought and the running of CI just wasted energy at the DC.

> Hbase does not closing a closed socket resulting in many CLOSE_WAIT 
> --------------------------------------------------------------------
>                 Key: HBASE-9393
>                 URL: https://issues.apache.org/jira/browse/HBASE-9393
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.2, 0.98.0
>         Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, 7279 regions
>            Reporter: Avi Zrachya
>            Assignee: Ashish Singhi
>            Priority: Critical
>             Fix For: 2.0.0
>         Attachments: HBASE-9393.patch, HBASE-9393.v1.patch, HBASE-9393.v2.patch, HBASE-9393.v3.patch,
HBASE-9393.v4.patch, HBASE-9393.v5.patch, HBASE-9393.v5.patch
> HBase dose not close a dead connection with the datanode.
> This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect to the
datanode because too many mapped sockets from one host to another on the same port.
> The example below is with low CLOSE_WAIT count because we had to restart hbase to solve
the porblem, later in time it will incease to 60-100K sockets on CLOSE_WAIT
> [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l
> 13156
> [root@hd2-region3 ~]# ps -ef |grep 21592
> root     17255 17219  0 12:26 pts/0    00:00:00 grep 21592
> hbase    21592     1 17 Aug29 ?        03:29:06 /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill
-9 %p -Xmx8000m -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dhbase.log.dir=/var/log/hbase
-Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ...

This message was sent by Atlassian JIRA

View raw message