cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lohfink <clohfin...@gmail.com>
Subject Re: Latency overhead on Cassandra cluster deployed on multiple AZs (AWS)
Date Mon, 11 Apr 2016 14:15:58 GMT
Where do you get the ~1ms latency between AZs? Comparing a short term
average to a 99th percentile isn't very fair.

"Over the last month, the median is 2.09 ms, 90th percentile is 20ms,
99th percentile
is 47ms." - per
https://www.quora.com/What-are-typical-ping-times-between-different-EC2-availability-zones-within-the-same-region

Are you using EBS? That would further impact latency on reads and GCs will
always cause hiccups in the 99th+.

Chris


On Mon, Apr 11, 2016 at 7:57 AM, Alessandro Pieri <sirio7g@gmail.com> wrote:

> Hi everyone,
>
> Last week I ran some tests to estimate the latency overhead introduces in
> a Cassandra cluster by a multi availability zones setup on AWS EC2.
>
> I started a Cassandra cluster of 6 nodes deployed on 3 different AZs (2
> nodes/AZ).
>
> Then, I used cassandra-stress to create an INSERT (write) test of 20M
> entries with a replication factor = 3, right after, I ran cassandra-stress
> again to READ 10M entries.
>
> Well, I got the following unexpected result:
>
> Single-AZ, CL=ONE -> median/95th percentile/99th percentile:
> 1.06ms/7.41ms/55.81ms
> Multi-AZ, CL=ONE -> median/95th percentile/99th percentile:
> 1.16ms/38.14ms/47.75ms
>
> Basically, switching to the multi-AZ setup the latency increased of ~30ms.
> That's too much considering the the average network latency between AZs on
> AWS is ~1ms.
>
> Since I couldn't find anything to explain those results, I decided to run
> the cassandra-stress specifying only a single node entry (i.e. "--nodes
> node1" instead of "--nodes node1,node2,node3,node4,node5,node6") and
> surprisingly the latency went back to 5.9 ms.
>
> Trying to recap:
>
> Multi-AZ, CL=ONE, "--nodes node1,node2,node3,node4,node5,node6" -> 95th
> percentile: 38.14ms
> Multi-AZ, CL=ONE, "--nodes node1" -> 95th percentile: 5.9ms
>
> For the sake of completeness I've ran a further test using a consistency
> level = LOCAL_QUORUM and the test did not show any large variance with
> using a single node or multiple ones.
>
> Do you guys know what could be the reason?
>
> The test were executed on a m3.xlarge (network optimized) using the
> DataStax AMI 2.6.3 running Cassandra v2.0.15.
>
> Thank you in advance for your help.
>
> Cheers,
> Alessandro
>

Mime
View raw message