Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Sat, 7 Jun 2014 05:02:01 +0000 (UTC)
From: "Andrew Purtell (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12718956.1402094594606.86794.1402117321839@arcas>
In-Reply-To: <JIRA.12718956.1402094594606@arcas>
References: <JIRA.12718956.1402094594606@arcas>
Subject: [jira] [Commented] (HBASE-11306) Client connection starvation
 issues under high load on Amazon EC2
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HBASE-11306?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D140=
20694#comment-14020694 ]=20

Andrew Purtell commented on HBASE-11306:
----------------------------------------

Disabling offload prevents connections from getting into a bad state yes. W=
orth looking at if the client can get completely stuck on one stalled conne=
ction in other situations I think.

> Client connection starvation issues under high load on Amazon EC2
> -----------------------------------------------------------------
>
>                 Key: HBASE-11306
>                 URL: https://issues.apache.org/jira/browse/HBASE-11306
>             Project: HBase
>          Issue Type: Bug
>         Environment: Amazon EC2
>            Reporter: Andrew Purtell
>
> I am using YCSB 0.1.4 with Hadoop 2.2.0 and HBase 0.98.3 RC2 on an EC2 te=
stbed (c3.8xlarge instances, SSD backed, 10 GigE networking). There are fiv=
e slaves and five separate clients. I start with a prepopulated table of 10=
0M rows over ~20 regions and run 5 YCSB clients concurrently targeting 250,=
000 ops/sec in aggregate. (Can reproduce this less effectively at 100k/ops/=
sec aggregate also.) Workload A. Due to how I set up the test, the data is =
all in one HFile per region and very likely in cache. All writes will fit i=
n the aggregate memstore. No flushes or compactions are observed on any ser=
ver during the test, only the occasional log roll. Despite these favorable =
conditions developed over time to isolate this issue, a few of the clients =
will stop making progress until socket timeouts after 60 seconds, leading t=
o very large op latency outliers. With the above detail plus some added ext=
ra logging we can rule out storage layer effects. Turning to the network, t=
his is where things get interesting.
> I used {{while true ; do clear ; ss -a -o|grep ESTAB|grep 8120 ; sleep 5 =
; done}} (8120 is the configured RS data port) to watch receive and send so=
cket queues and TCP level timers on all of the clients and servers simultan=
eously during the run.=20
> I have Nagle disabled on the clients and servers and JVM networking set u=
p to use IPv4 only. The YCSB clients are configured to use 20 threads. Thes=
e threads are expected to share 5 active connections. one to each RegionSer=
ver. When the test starts we see exactly what we'd expect, 5 established TC=
Pv4 connections.
> On all servers usually the recv and send queues were empty when sampled. =
I never saw more than 10K waiting. The servers occasionally retransmitted, =
but with timers ~200ms and retry counts ~0.
> The client side is another story. We see serious problems like:
> {noformat}
> tcp    ESTAB      0      8733   10.220.15.45:41428   10.220.2.115:8120   =
  timer:(on,38sec,7)
> {noformat}
> That is about 9K of data still waiting to be sent after 7 TCP level retra=
nsmissions.=20
> There is some unfair queueing and packet drops happening at the network l=
evel, but we should be handling this better.
> During the periods when YCSB is not making progress, there is only that o=
ne connection to one RS in established state. There should be 5 established=
 connections, one to each RS, but the other 4 have been dropped somehow. Th=
e one distressed connection remains established for the duration of the pro=
blem, while the retransmission timer count on the connection ticks upward. =
It is dropped once the socket times out at the app level. Why are the conne=
ctions to the other RegionServers dropped? Why are all threads blocked wait=
ing on the one connection for the socket timeout interval (60 seconds)? Aft=
er the socket timeout we see the stuck connection dropped and 5 new connect=
ions immediately established. YCSB doesn't do anything that would lead to t=
his behavior, it is using separate HTable instances for each client thread =
and not closing the table references until test cleanup. These behaviors se=
em internal to the HBase client.=20
> Is maintaining only a single multiplexed connection to each RegionServer =
the best approach?=20
> A related issue is we collect zombie sockets in ESTABLISHED state on the =
server. Also likely not our fault per se. Keepalives are enabled so they wi=
ll eventually be garbage collected by the OS. On Linux systems this will ta=
ke 2 hours. We might want to drop connections where we don't see activity s=
ooner than that. Before HBASE-11277 we were spinning indefinitely on a core=
 for each connection in this state.
> I have tried this using a narrow range of recent Java 7 and Java 8 runtim=
es and they all produce the same results. I have also launched several sepa=
rate EC2 based test clusters and they all produce the same results, so this=
 is a generic platform issue.


--
This message was sent by Atlassian JIRA
(v6.2#6252)