Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAOxAL61WJTtB5OcyZccUVp7rPqyJm2wtBz_C8fC2MnpHmxbKrg@mail.gmail.com>
References: 
 <CALYPQMfmYFmmKZvuabAbZkeqj-vP1M0BL-xs2bBfHty5XeYC9Q@mail.gmail.com>
 <CAOsmgAyVOoM+AjRdsFA9BgexRErta9k2oEzir6=CcPPWO30OSA@mail.gmail.com>
 <CANo6D6FHb261w=vAV6_My1w4FN47CyvXcMBfZ3FhsKeQ93KeZg@mail.gmail.com>
 <CAOxAL61WJTtB5OcyZccUVp7rPqyJm2wtBz_C8fC2MnpHmxbKrg@mail.gmail.com>
From: Alessandro Pieri <sirio7g@gmail.com>
Date: Tue, 12 Apr 2016 18:52:14 +0200
Message-ID: 
 <CALYPQMe-ox5tr-zqY+sTYf_GPhBNmdYaUwRiXRT44RTi4Sw6SA@mail.gmail.com>
Subject: Re: Latency overhead on Cassandra cluster deployed on multiple AZs
 (AWS)
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a114032a08b3baa05304c7c86

--001a114032a08b3baa05304c7c86
Content-Type: text/plain; charset=UTF-8

Hi Jack,

As mentioned before I've used m3.xlarge instance types together with two
ephemeral disks in raid 0 and, according to Amazon, they have "high"
network performance.

I ran many tests starting with a brand-new cluster every time and I got
consistent results.

I believe there's something that I cannot explain yet with the client used
by cassandra-stress to connect to the nodes, I'd like to understand why
there is such a big difference:

Multi-AZ, CL=ONE, "--nodes node1,node2,node3,node4,node5,node6" -> 95th
percentile: 38.14ms
Multi-AZ, CL=ONE, "--nodes node1" -> 95th percentile: 5.9ms

Hope you can help to figure it out.

Cheers,
Alessandro


On Tue, Apr 12, 2016 at 5:43 PM, Jack Krupansky <jack.krupansky@gmail.com>
wrote:

> Which instance type are you using? Some may be throttled for EBS access,
> so you could bump into a rate limit, and who knows what AWS will do at that
> point.
>
> -- Jack Krupansky
>
> On Tue, Apr 12, 2016 at 6:02 AM, Alessandro Pieri <alessandro@getstream.io
> > wrote:
>
>> Thanks Chris for your reply.
>>
>> I ran the tests 3 times for 20 minutes/each and I monitored the network
>> latency in the meanwhile, it was very low (even the 99th percentile).
>>
>> I didn't notice any cpu spike caused by the GC but, as you pointed out, I
>> will look into the GC log, just to be sure.
>>
>> In order to avoid the problem you mentioned with EBS and to keep the
>> deviation under control I used two ephemeral disks in raid 0.
>>
>> I think the odd results come from the way cassandra-stress deals with
>> multiple nodes. As soon as possible I will go through the Java code to get
>> some more detail.
>>
>> If you have something else in your mind please let me know, your comments
>> were really appreciated.
>>
>> Cheers,
>> Alessandro
>>
>>
>> On Mon, Apr 11, 2016 at 4:15 PM, Chris Lohfink <clohfink85@gmail.com>
>> wrote:
>>
>>> Where do you get the ~1ms latency between AZs? Comparing a short term
>>> average to a 99th percentile isn't very fair.
>>>
>>> "Over the last month, the median is 2.09 ms, 90th percentile is
>>> 20ms, 99th percentile is 47ms." - per
>>> https://www.quora.com/What-are-typical-ping-times-between-different-EC2-availability-zones-within-the-same-region
>>>
>>> Are you using EBS? That would further impact latency on reads and GCs
>>> will always cause hiccups in the 99th+.
>>>
>>> Chris
>>>
>>>
>>> On Mon, Apr 11, 2016 at 7:57 AM, Alessandro Pieri <sirio7g@gmail.com>
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Last week I ran some tests to estimate the latency overhead introduces
>>>> in a Cassandra cluster by a multi availability zones setup on AWS EC2.
>>>>
>>>> I started a Cassandra cluster of 6 nodes deployed on 3 different AZs (2
>>>> nodes/AZ).
>>>>
>>>> Then, I used cassandra-stress to create an INSERT (write) test of 20M
>>>> entries with a replication factor = 3, right after, I ran cassandra-stress
>>>> again to READ 10M entries.
>>>>
>>>> Well, I got the following unexpected result:
>>>>
>>>> Single-AZ, CL=ONE -> median/95th percentile/99th percentile:
>>>> 1.06ms/7.41ms/55.81ms
>>>> Multi-AZ, CL=ONE -> median/95th percentile/99th percentile:
>>>> 1.16ms/38.14ms/47.75ms
>>>>
>>>> Basically, switching to the multi-AZ setup the latency increased of
>>>> ~30ms. That's too much considering the the average network latency between
>>>> AZs on AWS is ~1ms.
>>>>
>>>> Since I couldn't find anything to explain those results, I decided to
>>>> run the cassandra-stress specifying only a single node entry (i.e. "--nodes
>>>> node1" instead of "--nodes node1,node2,node3,node4,node5,node6") and
>>>> surprisingly the latency went back to 5.9 ms.
>>>>
>>>> Trying to recap:
>>>>
>>>> Multi-AZ, CL=ONE, "--nodes node1,node2,node3,node4,node5,node6" -> 95th
>>>> percentile: 38.14ms
>>>> Multi-AZ, CL=ONE, "--nodes node1" -> 95th percentile: 5.9ms
>>>>
>>>> For the sake of completeness I've ran a further test using a
>>>> consistency level = LOCAL_QUORUM and the test did not show any large
>>>> variance with using a single node or multiple ones.
>>>>
>>>> Do you guys know what could be the reason?
>>>>
>>>> The test were executed on a m3.xlarge (network optimized) using the
>>>> DataStax AMI 2.6.3 running Cassandra v2.0.15.
>>>>
>>>> Thank you in advance for your help.
>>>>
>>>> Cheers,
>>>> Alessandro
>>>>
>>>
>>>
>>
>>
>> --
>> *Alessandro Pieri*
>> *Software Architect @ Stream.io Inc*
>> e-Mail: alessandro@getstream.io - twitter: sirio7g
>> <http://twitter.com/sirio7g>
>>
>>
>

--001a114032a08b3baa05304c7c86
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Jack,<div><br></div><div>As mentioned before I&#39;ve u=
sed m3.xlarge instance types together with two ephemeral disks in raid 0 an=
d, according to Amazon, they have &quot;high&quot; network performance.</di=
v><div><br></div><div>I ran many tests starting with a brand-new cluster ev=
ery time and I got consistent results.</div><div><br></div><div>I believe t=
here&#39;s something that I cannot explain yet with the client used by cass=
andra-stress to connect to the nodes, I&#39;d like to understand why there =
is such a big difference:</div><div><br></div><div><div style=3D"font-size:=
12.8px">Multi-AZ, CL=3DONE, &quot;--nodes node1,node2,node3,node4,node5,nod=
e6&quot; -&gt; 95th percentile: 38.14ms<br></div><div style=3D"font-size:12=
.8px">Multi-AZ, CL=3DONE, &quot;--nodes node1&quot; -&gt; 95th percentile: =
5.9ms</div></div><div style=3D"font-size:12.8px"><br></div><div style=3D"fo=
nt-size:12.8px">Hope you can help to figure it out.</div><div style=3D"font=
-size:12.8px"><br></div><div style=3D"font-size:12.8px">Cheers,</div><div s=
tyle=3D"font-size:12.8px">Alessandro</div><div style=3D"font-size:12.8px"><=
br></div><div style=3D"font-size:12.8px"><br></div><div style=3D"font-size:=
12.8px"><br></div></div><div class=3D"gmail_extra"><br><div class=3D"gmail_=
quote">On Tue, Apr 12, 2016 at 5:43 PM, Jack Krupansky <span dir=3D"ltr">&l=
t;<a href=3D"mailto:jack.krupansky@gmail.com" target=3D"_blank">jack.krupan=
sky@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
dir=3D"ltr">Which instance type are you using? Some may be throttled for EB=
S access, so you could bump into a rate limit, and who knows what AWS will =
do at that point.</div><div class=3D"gmail_extra"><span class=3D"HOEnZb"><f=
ont color=3D"#888888"><br clear=3D"all"><div><div><div dir=3D"ltr">-- Jack =
Krupansky</div></div></div></font></span><div><div class=3D"h5">
<br><div class=3D"gmail_quote">On Tue, Apr 12, 2016 at 6:02 AM, Alessandro =
Pieri <span dir=3D"ltr">&lt;<a href=3D"mailto:alessandro@getstream.io" targ=
et=3D"_blank">alessandro@getstream.io</a>&gt;</span> wrote:<br><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex"><div dir=3D"ltr">Thanks Chris for your reply.<div><br></=
div><div>I ran the tests 3 times for 20 minutes/each and I monitored the ne=
twork latency in the meanwhile, it was very low (even the 99th percentile).=
</div><div><br></div><div>I didn&#39;t notice any cpu spike caused by the G=
C but, as you pointed out, I will look into the GC log, just to be sure.</d=
iv><div><br></div><div>In order to avoid the problem you mentioned with EBS=
 and to keep the deviation under control I used two ephemeral disks in raid=
 0.</div><div><br></div><div>I think the odd results come from the way cass=
andra-stress deals with multiple nodes. As soon as possible I will go throu=
gh the Java code to get some more detail.</div><div><br></div><div>If you h=
ave something else in your mind please let me know, your comments were real=
ly appreciated.</div><div><br></div><div>Cheers,</div><div>Alessandro=C2=A0=
</div><div><br></div></div><div class=3D"gmail_extra"><div><div><br><div cl=
ass=3D"gmail_quote">On Mon, Apr 11, 2016 at 4:15 PM, Chris Lohfink <span di=
r=3D"ltr">&lt;<a href=3D"mailto:clohfink85@gmail.com" target=3D"_blank">clo=
hfink85@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote=
" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><=
div dir=3D"ltr"><div>Where do you get the ~1ms latency between AZs? Compari=
ng a short term average to a 99th percentile isn&#39;t very fair.</div><div=
><br></div>&quot;<span style=3D"color:rgb(51,51,51);font-family:Georgia,Tim=
es,&#39;Times New Roman&#39;,serif;font-size:15px;line-height:21px">Over th=
e last month, the median is 2.09 ms, 90th percentile is 20ms,=C2=A099th=C2=
=A0</span><span style=3D"color:rgb(51,51,51);font-family:Georgia,Times,&#39=
;Times New Roman&#39;,serif;font-size:15px;line-height:21px">percentile is =
47ms.&quot; - per=C2=A0</span><a href=3D"https://www.quora.com/What-are-typ=
ical-ping-times-between-different-EC2-availability-zones-within-the-same-re=
gion" target=3D"_blank">https://www.quora.com/What-are-typical-ping-times-b=
etween-different-EC2-availability-zones-within-the-same-region</a><div><br>=
</div><div>Are you using EBS? That would further impact latency on reads an=
d GCs will always cause hiccups in the 99th+.</div><span><font color=3D"#88=
8888"><div><br></div><div>Chris<br><div><div><span style=3D"color:rgb(51,51=
,51);font-family:Georgia,Times,&#39;Times New Roman&#39;,serif;font-size:15=
px;line-height:21px"><br></span></div></div></div></font></span></div><div>=
<div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Apr =
11, 2016 at 7:57 AM, Alessandro Pieri <span dir=3D"ltr">&lt;<a href=3D"mail=
to:sirio7g@gmail.com" target=3D"_blank">sirio7g@gmail.com</a>&gt;</span> wr=
ote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border=
-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><span style=3D"font=
-size:12.8px">Hi everyone,</span><div style=3D"font-size:12.8px"><br></div>=
<div style=3D"font-size:12.8px">Last week I ran some tests to estimate the =
latency overhead introduces in a Cassandra cluster by a multi availability =
zones setup on AWS EC2.=C2=A0</div><div style=3D"font-size:12.8px"><br></di=
v><div style=3D"font-size:12.8px">I started a Cassandra cluster of 6 nodes =
deployed on 3 different AZs (2 nodes/AZ).</div><div style=3D"font-size:12.8=
px"><br></div><div style=3D"font-size:12.8px">Then, I used cassandra-stress=
 to create an INSERT (write) test of 20M entries with a replication factor =
=3D 3, right after, I ran cassandra-stress again to READ 10M entries.</div>=
<div style=3D"font-size:12.8px"><br></div><div style=3D"font-size:12.8px">W=
ell, I got the following unexpected result:=C2=A0</div><div style=3D"font-s=
ize:12.8px"><br></div><div style=3D"font-size:12.8px">Single-AZ, CL=3DONE -=
&gt; median/95th percentile/99th percentile: 1.06ms/7.41ms/55.81ms</div><di=
v style=3D"font-size:12.8px">Multi-AZ, CL=3DONE -&gt;=C2=A0<span style=3D"f=
ont-size:12.8px">median/95th percentile/99th percentile</span>: 1.16ms/38.1=
4ms/47.75ms<br></div><div style=3D"font-size:12.8px"><br clear=3D"all"><div=
>Basically, switching to the multi-AZ setup the latency increased of ~30ms.=
 That&#39;s too much considering the the average network latency between AZ=
s on AWS is ~1ms.</div><div><br></div><div>Since I couldn&#39;t find anythi=
ng to explain those results, I decided to run the cassandra-stress specifyi=
ng only a single node entry (i.e. &quot;--nodes node1&quot; instead of &quo=
t;--nodes node1,node2,node3,node4,node5,node6&quot;) and surprisingly the l=
atency went back to 5.9 ms.</div><div><br></div><div>Trying to recap:</div>=
<div><br></div><div>Multi-AZ, CL=3DONE, &quot;--nodes node1,node2,node3,nod=
e4,node5,node6&quot; -&gt; 95th percentile: 38.14ms<br></div><div>Multi-AZ,=
 CL=3DONE, &quot;--nodes node1&quot; -&gt; 95th percentile: 5.9ms</div><div=
><br></div><div>For the sake of completeness I&#39;ve ran a further test us=
ing a consistency level =3D LOCAL_QUORUM and the test did not show any larg=
e variance with using a single node or multiple ones.</div><div><br></div><=
div>Do you guys know what could be the reason?</div><div><br></div><div>The=
 test were executed on a m3.xlarge (network optimized) using the DataStax A=
MI 2.6.3 running Cassandra v2.0.15.</div><div><br></div><div>Thank you in a=
dvance for your help.</div><div><br></div><div>Cheers,</div><div>Alessandro=
</div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div></div><=
/div><span><font color=3D"#888888">-- <br><div><div dir=3D"ltr"><div><div d=
ir=3D"ltr"><div><div dir=3D"ltr"><div><b style=3D"font-size:12.8px">Alessan=
dro Pieri</b><br></div><div><i style=3D"font-size:12.8px">Software Architec=
t @ Stream.io Inc</i></div><div>e-Mail: <a href=3D"mailto:alessandro@getstr=
eam.io" target=3D"_blank">alessandro@getstream.io</a> - twitter: <a href=3D=
"http://twitter.com/sirio7g" target=3D"_blank">sirio7g</a></div><div><br></=
div></div></div></div></div></div></div>
</font></span></div>
</blockquote></div><br></div></div></div>
</blockquote></div><br></div>

--001a114032a08b3baa05304c7c86--