Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 18AA11090A for ; Fri, 11 Oct 2013 18:50:47 +0000 (UTC) Received: (qmail 26849 invoked by uid 500); 11 Oct 2013 18:50:44 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 26804 invoked by uid 500); 11 Oct 2013 18:50:43 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 26777 invoked by uid 99); 11 Oct 2013 18:50:43 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Oct 2013 18:50:43 +0000 Date: Fri, 11 Oct 2013 18:50:43 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-9393) Hbase dose not closing a closed socket resulting in many CLOSE_WAIT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792938#comment-13792938 ] Colin Patrick McCabe commented on HBASE-9393: --------------------------------------------- I guess I should also explain why this doesn't happen in branch-1 of Hadoop. The reason is because Hadoop-1 had no socket cache and no grace period before the sockets were closed. The client simply opened a new socket each time, performed the op, and then closed it. This would result in (basically) no sockets in {{CLOSE_WAIT}}. Remember {{CLOSE_WAIT}} only happens when the server is waiting for the client to execute {{close}}. Keeping sockets open is an optimization, but one that may require you to raise your maximum number of file descriptors. If you are not happy with this tradeoff, you can set {{dfs.client.socketcache.capacity}} to {{0}} and {{dfs.datanode.socket.reuse.keepalive}} to {{0}} to get the old branch-1 behavior. It will be slower, though. > Hbase dose not closing a closed socket resulting in many CLOSE_WAIT > -------------------------------------------------------------------- > > Key: HBASE-9393 > URL: https://issues.apache.org/jira/browse/HBASE-9393 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.2 > Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, 7279 regions > Reporter: Avi Zrachya > > HBase dose not close a dead connection with the datanode. > This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect to the datanode because too many mapped sockets from one host to another on the same port. > The example below is with low CLOSE_WAIT count because we had to restart hbase to solve the porblem, later in time it will incease to 60-100K sockets on CLOSE_WAIT > [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l > 13156 > [root@hd2-region3 ~]# ps -ef |grep 21592 > root 17255 17219 0 12:26 pts/0 00:00:00 grep 21592 > hbase 21592 1 17 Aug29 ? 03:29:06 /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ... -- This message was sent by Atlassian JIRA (v6.1#6144)