Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 00EA31845D for ; Mon, 25 Jan 2016 19:20:41 +0000 (UTC) Received: (qmail 33305 invoked by uid 500); 25 Jan 2016 19:20:40 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 33130 invoked by uid 500); 25 Jan 2016 19:20:40 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 33029 invoked by uid 99); 25 Jan 2016 19:20:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jan 2016 19:20:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 601A52C1F7B for ; Mon, 25 Jan 2016 19:20:40 +0000 (UTC) Date: Mon, 25 Jan 2016 19:20:40 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115799#comment-15115799 ] stack commented on HBASE-9393: ------------------------------ bq. I believe there is an option do to #1 even right now. Can't HBase be configured just to use pread and never read? We want sequential reading when doing long scans (the purported hdfs i/o 'pipeliniing'). We want to be able to pick and choose dependent on read-type (short scan or random get vs streaming scan..). This issue and suggestion offlist by [~Apache9] brings up the unfinished project, https://issues.apache.org/jira/browse/HBASE-5979, which is the proper way to fix what is going on in here (as well as doing proper separation of long vs short read). Would be good to revive. There is good stuff in the cited issue. Adding the below as finally in a method named pickReaderVersion seems a bit odd... is pickReaderVersion only place we read in the file trailer? That seems odd (not your issue [~ashish singhi]). You'd think we'd want to keep the trailer around in the reader. 522 } finally { 523 unbufferStream(fsdis); 524 } 525 } On commit, lets point this issue as to why we are doing gymnastics in unbufferStream method... and why the reflection. Is it odd adding this unbufferStream to hbase types when there is the Interface CanUnbuffer up in hdfs? Should we have a local hbase equivalent... and put it on HFileBlock, HFileReader... Then the relation is more clear? Perhaps overkill? Why you think the sequentialRead numbers are so different in your perf test above [~ashish singhi]? The extra setup after reading in the trailer? bq. TestStochasticLoadBalancer failure was not related to the change - it has failed intermittently. [~yuzhihong@gmail.com] Let me retry the patch. We need clean build to commit... for any patch. No more, '... it passes for me locally...'. It has to pass up here on apache. If we can't get it to pass, nothing should get checked in until tests are fixed. Otherwise our test suite is for nought and the running of CI just wasted energy at the DC. > Hbase does not closing a closed socket resulting in many CLOSE_WAIT > -------------------------------------------------------------------- > > Key: HBASE-9393 > URL: https://issues.apache.org/jira/browse/HBASE-9393 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.2, 0.98.0 > Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, 7279 regions > Reporter: Avi Zrachya > Assignee: Ashish Singhi > Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-9393.patch, HBASE-9393.v1.patch, HBASE-9393.v2.patch, HBASE-9393.v3.patch, HBASE-9393.v4.patch, HBASE-9393.v5.patch, HBASE-9393.v5.patch > > > HBase dose not close a dead connection with the datanode. > This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect to the datanode because too many mapped sockets from one host to another on the same port. > The example below is with low CLOSE_WAIT count because we had to restart hbase to solve the porblem, later in time it will incease to 60-100K sockets on CLOSE_WAIT > [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l > 13156 > [root@hd2-region3 ~]# ps -ef |grep 21592 > root 17255 17219 0 12:26 pts/0 00:00:00 grep 21592 > hbase 21592 1 17 Aug29 ? 03:29:06 /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)