Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 69391112D1 for ; Sat, 7 Jun 2014 05:02:02 +0000 (UTC) Received: (qmail 15464 invoked by uid 500); 7 Jun 2014 05:02:02 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 15416 invoked by uid 500); 7 Jun 2014 05:02:01 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 15406 invoked by uid 99); 7 Jun 2014 05:02:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Jun 2014 05:02:01 +0000 Date: Sat, 7 Jun 2014 05:02:01 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-11306) Client connection starvation issues under high load on Amazon EC2 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11306?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D140= 20694#comment-14020694 ]=20 Andrew Purtell commented on HBASE-11306: ---------------------------------------- Disabling offload prevents connections from getting into a bad state yes. W= orth looking at if the client can get completely stuck on one stalled conne= ction in other situations I think. > Client connection starvation issues under high load on Amazon EC2 > ----------------------------------------------------------------- > > Key: HBASE-11306 > URL: https://issues.apache.org/jira/browse/HBASE-11306 > Project: HBase > Issue Type: Bug > Environment: Amazon EC2 > Reporter: Andrew Purtell > > I am using YCSB 0.1.4 with Hadoop 2.2.0 and HBase 0.98.3 RC2 on an EC2 te= stbed (c3.8xlarge instances, SSD backed, 10 GigE networking). There are fiv= e slaves and five separate clients. I start with a prepopulated table of 10= 0M rows over ~20 regions and run 5 YCSB clients concurrently targeting 250,= 000 ops/sec in aggregate. (Can reproduce this less effectively at 100k/ops/= sec aggregate also.) Workload A. Due to how I set up the test, the data is = all in one HFile per region and very likely in cache. All writes will fit i= n the aggregate memstore. No flushes or compactions are observed on any ser= ver during the test, only the occasional log roll. Despite these favorable = conditions developed over time to isolate this issue, a few of the clients = will stop making progress until socket timeouts after 60 seconds, leading t= o very large op latency outliers. With the above detail plus some added ext= ra logging we can rule out storage layer effects. Turning to the network, t= his is where things get interesting. > I used {{while true ; do clear ; ss -a -o|grep ESTAB|grep 8120 ; sleep 5 = ; done}} (8120 is the configured RS data port) to watch receive and send so= cket queues and TCP level timers on all of the clients and servers simultan= eously during the run.=20 > I have Nagle disabled on the clients and servers and JVM networking set u= p to use IPv4 only. The YCSB clients are configured to use 20 threads. Thes= e threads are expected to share 5 active connections. one to each RegionSer= ver. When the test starts we see exactly what we'd expect, 5 established TC= Pv4 connections. > On all servers usually the recv and send queues were empty when sampled. = I never saw more than 10K waiting. The servers occasionally retransmitted, = but with timers ~200ms and retry counts ~0. > The client side is another story. We see serious problems like: > {noformat} > tcp ESTAB 0 8733 10.220.15.45:41428 10.220.2.115:8120 = timer:(on,38sec,7) > {noformat} > That is about 9K of data still waiting to be sent after 7 TCP level retra= nsmissions.=20 > There is some unfair queueing and packet drops happening at the network l= evel, but we should be handling this better. > During the periods when YCSB is not making progress, there is only that o= ne connection to one RS in established state. There should be 5 established= connections, one to each RS, but the other 4 have been dropped somehow. Th= e one distressed connection remains established for the duration of the pro= blem, while the retransmission timer count on the connection ticks upward. = It is dropped once the socket times out at the app level. Why are the conne= ctions to the other RegionServers dropped? Why are all threads blocked wait= ing on the one connection for the socket timeout interval (60 seconds)? Aft= er the socket timeout we see the stuck connection dropped and 5 new connect= ions immediately established. YCSB doesn't do anything that would lead to t= his behavior, it is using separate HTable instances for each client thread = and not closing the table references until test cleanup. These behaviors se= em internal to the HBase client.=20 > Is maintaining only a single multiplexed connection to each RegionServer = the best approach?=20 > A related issue is we collect zombie sockets in ESTABLISHED state on the = server. Also likely not our fault per se. Keepalives are enabled so they wi= ll eventually be garbage collected by the OS. On Linux systems this will ta= ke 2 hours. We might want to drop connections where we don't see activity s= ooner than that. Before HBASE-11277 we were spinning indefinitely on a core= for each connection in this state. > I have tried this using a narrow range of recent Java 7 and Java 8 runtim= es and they all produce the same results. I have also launched several sepa= rate EC2 based test clusters and they all produce the same results, so this= is a generic platform issue. -- This message was sent by Atlassian JIRA (v6.2#6252)