hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11306) Client connection starvation issues under high load on Amazon EC2
Date Sat, 07 Jun 2014 05:02:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020694#comment-14020694
] 

Andrew Purtell commented on HBASE-11306:
----------------------------------------

Disabling offload prevents connections from getting into a bad state yes. Worth looking at
if the client can get completely stuck on one stalled connection in other situations I think.

> Client connection starvation issues under high load on Amazon EC2
> -----------------------------------------------------------------
>
>                 Key: HBASE-11306
>                 URL: https://issues.apache.org/jira/browse/HBASE-11306
>             Project: HBase
>          Issue Type: Bug
>         Environment: Amazon EC2
>            Reporter: Andrew Purtell
>
> I am using YCSB 0.1.4 with Hadoop 2.2.0 and HBase 0.98.3 RC2 on an EC2 testbed (c3.8xlarge
instances, SSD backed, 10 GigE networking). There are five slaves and five separate clients.
I start with a prepopulated table of 100M rows over ~20 regions and run 5 YCSB clients concurrently
targeting 250,000 ops/sec in aggregate. (Can reproduce this less effectively at 100k/ops/sec
aggregate also.) Workload A. Due to how I set up the test, the data is all in one HFile per
region and very likely in cache. All writes will fit in the aggregate memstore. No flushes
or compactions are observed on any server during the test, only the occasional log roll. Despite
these favorable conditions developed over time to isolate this issue, a few of the clients
will stop making progress until socket timeouts after 60 seconds, leading to very large op
latency outliers. With the above detail plus some added extra logging we can rule out storage
layer effects. Turning to the network, this is where things get interesting.
> I used {{while true ; do clear ; ss -a -o|grep ESTAB|grep 8120 ; sleep 5 ; done}} (8120
is the configured RS data port) to watch receive and send socket queues and TCP level timers
on all of the clients and servers simultaneously during the run. 
> I have Nagle disabled on the clients and servers and JVM networking set up to use IPv4
only. The YCSB clients are configured to use 20 threads. These threads are expected to share
5 active connections. one to each RegionServer. When the test starts we see exactly what we'd
expect, 5 established TCPv4 connections.
> On all servers usually the recv and send queues were empty when sampled. I never saw
more than 10K waiting. The servers occasionally retransmitted, but with timers ~200ms and
retry counts ~0.
> The client side is another story. We see serious problems like:
> {noformat}
> tcp    ESTAB      0      8733   10.220.15.45:41428   10.220.2.115:8120     timer:(on,38sec,7)
> {noformat}
> That is about 9K of data still waiting to be sent after 7 TCP level retransmissions.

> There is some unfair queueing and packet drops happening at the network level, but we
should be handling this better.
> During the periods when YCSB is not making progress, there is only that one connection
to one RS in established state. There should be 5 established connections, one to each RS,
but the other 4 have been dropped somehow. The one distressed connection remains established
for the duration of the problem, while the retransmission timer count on the connection ticks
upward. It is dropped once the socket times out at the app level. Why are the connections
to the other RegionServers dropped? Why are all threads blocked waiting on the one connection
for the socket timeout interval (60 seconds)? After the socket timeout we see the stuck connection
dropped and 5 new connections immediately established. YCSB doesn't do anything that would
lead to this behavior, it is using separate HTable instances for each client thread and not
closing the table references until test cleanup. These behaviors seem internal to the HBase
client. 
> Is maintaining only a single multiplexed connection to each RegionServer the best approach?

> A related issue is we collect zombie sockets in ESTABLISHED state on the server. Also
likely not our fault per se. Keepalives are enabled so they will eventually be garbage collected
by the OS. On Linux systems this will take 2 hours. We might want to drop connections where
we don't see activity sooner than that. Before HBASE-11277 we were spinning indefinitely on
a core for each connection in this state.
> I have tried this using a narrow range of recent Java 7 and Java 8 runtimes and they
all produce the same results. I have also launched several separate EC2 based test clusters
and they all produce the same results, so this is a generic platform issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message