Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A33A3200BFE for ; Mon, 16 Jan 2017 19:36:33 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A1C96160B41; Mon, 16 Jan 2017 18:36:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C5042160B28 for ; Mon, 16 Jan 2017 19:36:32 +0100 (CET) Received: (qmail 49433 invoked by uid 500); 16 Jan 2017 18:36:31 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 49417 invoked by uid 99); 16 Jan 2017 18:36:31 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Jan 2017 18:36:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 393121A0118 for ; Mon, 16 Jan 2017 18:36:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id KlI6olllWTc7 for ; Mon, 16 Jan 2017 18:36:29 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 83AD95FE45 for ; Mon, 16 Jan 2017 18:36:29 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 18307E58A3 for ; Mon, 16 Jan 2017 18:36:28 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 6BB1825287 for ; Mon, 16 Jan 2017 18:36:26 +0000 (UTC) Date: Mon, 16 Jan 2017 18:36:26 +0000 (UTC) From: "Shridhar Sahukar (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-17468) unread messages in TCP connections - possible connection leak MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 16 Jan 2017 18:36:33 -0000 [ https://issues.apache.org/jira/browse/HBASE-17468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824410#comment-15824410 ] Shridhar Sahukar commented on HBASE-17468: ------------------------------------------ Thanks Anoop/Ashish. We can try the patch available at HBASE-9393. Is HBASE-9393-v15.patch the latest? The fix is marked against 2.0.0 release. Would it work with 1.2.0? I can try to backport it, but would it cause any side effects? Thanks > unread messages in TCP connections - possible connection leak > ------------------------------------------------------------- > > Key: HBASE-17468 > URL: https://issues.apache.org/jira/browse/HBASE-17468 > Project: HBase > Issue Type: Bug > Reporter: Shridhar Sahukar > Priority: Critical > > We are running HBase 1.2.0-cdh5.7.1 (Cloudera distribution). > On our Hadoop cluster, we are seeing that each HBase region server has large number of TCP connections to all the HDFS data nodes and all these connections have unread data in socket buffers. Some of these connections are also in CLOSE_WAIT or FIN_WAIT1 state while the rest are in ESTABLISHED state. > Looks like HBase is creating some connections requesting data from HDFS, but its forgetting about those connections before it could read the data. Thus the connections are left lingering around with large data stuck in their receive buffers. Also, it seems HDFS closes these connections after a while, but since there is data in receive buffer the connection is left in CLOSE_WAIT/FIN_WAIT1 states. > Below is a snapshot from one of the region servers: > ## Total number of connections to HDFS (pid of region server is 143722) > [bda@md-bdadev-42 hbase]$ sudo netstat -anp|grep 143722 | wc -l > 827 > ## Connections that are not in ESTABLISHED state > [bda@md-bdadev-42 hbase]$ sudo netstat -anp|grep 143722 | grep -v ESTABLISHED | wc -l > 344 > ##Snapshot of some of these connections: > tcp 133887 0 146.1.180.43:48533 146.1.180.40:50010 ESTABLISHED 143722/java > tcp 82934 0 146.1.180.43:59647 146.1.180.42:50010 ESTABLISHED 143722/java > tcp 0 0 146.1.180.43:50761 146.1.180.27:2181 ESTABLISHED 143722/java > tcp 234084 0 146.1.180.43:58335 146.1.180.42:50010 ESTABLISHED 143722/java > tcp 967667 0 146.1.180.43:56136 146.1.180.68:50010 ESTABLISHED 143722/java > tcp 156037 0 146.1.180.43:59659 146.1.180.42:50010 ESTABLISHED 143722/java > tcp 212488 0 146.1.180.43:56810 146.1.180.48:50010 ESTABLISHED 143722/java > tcp 61871 0 146.1.180.43:53593 146.1.180.35:50010 ESTABLISHED 143722/java > tcp 121216 0 146.1.180.43:35324 146.1.180.38:50010 ESTABLISHED 143722/java > tcp 1 0 146.1.180.43:32982 146.1.180.42:50010 CLOSE_WAIT 143722/java > tcp 82934 0 146.1.180.43:42359 146.1.180.54:50010 ESTABLISHED 143722/java > tcp 159422 0 146.1.180.43:59731 146.1.180.42:50010 ESTABLISHED 143722/java > tcp 134573 0 146.1.180.43:60210 146.1.180.76:50010 ESTABLISHED 143722/java > tcp 82934 0 146.1.180.43:59713 146.1.180.42:50010 ESTABLISHED 143722/java > tcp 135765 0 146.1.180.43:44412 146.1.180.29:50010 ESTABLISHED 143722/java > tcp 161655 0 146.1.180.43:43117 146.1.180.42:50010 ESTABLISHED 143722/java > tcp 75990 0 146.1.180.43:59729 146.1.180.42:50010 ESTABLISHED 143722/java > tcp 78583 0 146.1.180.43:59971 146.1.180.42:50010 ESTABLISHED 143722/java > tcp 1 0 146.1.180.43:39893 146.1.180.67:50010 CLOSE_WAIT 143722/java > tcp 1 0 146.1.180.43:38834 146.1.180.47:50010 CLOSE_WAIT 143722/java > tcp 1 0 146.1.180.43:40707 146.1.180.50:50010 CLOSE_WAIT 143722/java > tcp 106102 0 146.1.180.43:48208 146.1.180.75:50010 ESTABLISHED 143722/java > tcp 332013 0 146.1.180.43:34795 146.1.180.37:50010 ESTABLISHED 143722/java > tcp 1 0 146.1.180.43:57644 146.1.180.67:50010 CLOSE_WAIT 143722/java > tcp 79119 0 146.1.180.43:54438 146.1.180.70:50010 ESTABLISHED 143722/java > tcp 77438 0 146.1.180.43:35259 146.1.180.38:50010 ESTABLISHED 143722/java > tcp 1 0 146.1.180.43:57579 146.1.180.41:50010 CLOSE_WAIT 143722/java > tcp 318091 0 146.1.180.43:60124 146.1.180.42:50010 ESTABLISHED 143722/java > tcp 1 0 146.1.180.43:51715 146.1.180.70:50010 CLOSE_WAIT 143722/java > tcp 126519 0 146.1.180.43:36389 146.1.180.49:50010 ESTABLISHED 143722/java > tcp 1 0 146.1.180.43:45656 146.1.180.75:50010 CLOSE_WAIT 143722/java > tcp 113720 0 146.1.180.43:59741 146.1.180.42:50010 ESTABLISHED 143722/java > tcp 74599 0 146.1.180.43:44192 146.1.180.60:50010 ESTABLISHED 143722/java > tcp 131224 0 146.1.180.43:53708 146.1.180.44:50010 ESTABLISHED 143722/java > tcp 1433915 0 146.1.180.43:57140 146.1.180.67:50010 ESTABLISHED 143722/java -- This message was sent by Atlassian JIRA (v6.3.4#6332)