Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6F26619452 for ; Tue, 12 Apr 2016 16:52:36 +0000 (UTC) Received: (qmail 32450 invoked by uid 500); 12 Apr 2016 16:52:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 32390 invoked by uid 500); 12 Apr 2016 16:52:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 32380 invoked by uid 99); 12 Apr 2016 16:52:33 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2016 16:52:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 21590180525 for ; Tue, 12 Apr 2016 16:52:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id xfWOyva8Pt_C for ; Tue, 12 Apr 2016 16:52:30 +0000 (UTC) Received: from mail-lf0-f54.google.com (mail-lf0-f54.google.com [209.85.215.54]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 63EE05FDE6 for ; Tue, 12 Apr 2016 16:52:30 +0000 (UTC) Received: by mail-lf0-f54.google.com with SMTP id j11so33789999lfb.1 for ; Tue, 12 Apr 2016 09:52:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=BEegj3ZwqHNRyN8vfEl/TR19z5ZJPwtx2tmYwmOyr3g=; b=K7gTjTCkw3C3v8QxZF66rRPt4jGf4KBsqGejSlWeAcxX4+2Vft7dF6ia6d/FTuq/2r R5W3wB44HzQnDDw6mbMkN0Bmssv7n5yqK6zJLuoO0p66i5n+ueIIrE52WcJ2kMDKx16E Yu4JU+A2b7VDryYrgXA0k98RlgEfPwpvOwPllI39B4imjrvRwFDdtrkW4AtJIiLh3cIU t1z358HTYe6wIjEUNjqR2Plrk+uTtyDMWFGsRU1/RTz8gQLXUFG+9Xj6TyFvUm4hqUie cafV59iJMAfBIMii4Va+MUZJ+ZlGRPql5/F0EwxRu6O1KcTQ6U8upqmESRnFKRqVFGn6 x7vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=BEegj3ZwqHNRyN8vfEl/TR19z5ZJPwtx2tmYwmOyr3g=; b=LBVrBfnYuf/n+4lYXsdu7nLeUzWTse8x6ue74w7R82nzwNcY34LAsJtqHVakBluU6G nQYTvCyGqU3oNl5L+QBIL5EZQsfGfmZ4LuUv/CHOA2pP7AsLE7BE6yyOjtAxIqxVeIt1 0zEZwbWv9m3MCpB5nUs0/Y16+3uyvm2JNlSo90MPraTJ2PfHgMA5iGedE6ccsFRfBGv5 YcIKw/WIU2uevneuxeFh6HMi3mquMohH+JL9ELgRtExUHrL8YV1Zxsk3BMy/GFgE2KXe En8XZ5LRJTLo8m/T+yD9rz60agRhJZ0mFLec+S4yI86r05e4j3XoPKgwskrEweslO8Ws mgGQ== X-Gm-Message-State: AOPr4FWLowpdpe5sPlFdJbYIdN/VcH9JHoVqL1+2eSSLQphwalzRq6qJydoBHgfMaAQYvmVtX6OSKrNrrQzdtg== X-Received: by 10.25.157.209 with SMTP id g200mr1892559lfe.101.1460479948693; Tue, 12 Apr 2016 09:52:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.29.97 with HTTP; Tue, 12 Apr 2016 09:52:14 -0700 (PDT) In-Reply-To: References: From: Alessandro Pieri Date: Tue, 12 Apr 2016 18:52:14 +0200 Message-ID: Subject: Re: Latency overhead on Cassandra cluster deployed on multiple AZs (AWS) To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a114032a08b3baa05304c7c86 --001a114032a08b3baa05304c7c86 Content-Type: text/plain; charset=UTF-8 Hi Jack, As mentioned before I've used m3.xlarge instance types together with two ephemeral disks in raid 0 and, according to Amazon, they have "high" network performance. I ran many tests starting with a brand-new cluster every time and I got consistent results. I believe there's something that I cannot explain yet with the client used by cassandra-stress to connect to the nodes, I'd like to understand why there is such a big difference: Multi-AZ, CL=ONE, "--nodes node1,node2,node3,node4,node5,node6" -> 95th percentile: 38.14ms Multi-AZ, CL=ONE, "--nodes node1" -> 95th percentile: 5.9ms Hope you can help to figure it out. Cheers, Alessandro On Tue, Apr 12, 2016 at 5:43 PM, Jack Krupansky wrote: > Which instance type are you using? Some may be throttled for EBS access, > so you could bump into a rate limit, and who knows what AWS will do at that > point. > > -- Jack Krupansky > > On Tue, Apr 12, 2016 at 6:02 AM, Alessandro Pieri > wrote: > >> Thanks Chris for your reply. >> >> I ran the tests 3 times for 20 minutes/each and I monitored the network >> latency in the meanwhile, it was very low (even the 99th percentile). >> >> I didn't notice any cpu spike caused by the GC but, as you pointed out, I >> will look into the GC log, just to be sure. >> >> In order to avoid the problem you mentioned with EBS and to keep the >> deviation under control I used two ephemeral disks in raid 0. >> >> I think the odd results come from the way cassandra-stress deals with >> multiple nodes. As soon as possible I will go through the Java code to get >> some more detail. >> >> If you have something else in your mind please let me know, your comments >> were really appreciated. >> >> Cheers, >> Alessandro >> >> >> On Mon, Apr 11, 2016 at 4:15 PM, Chris Lohfink >> wrote: >> >>> Where do you get the ~1ms latency between AZs? Comparing a short term >>> average to a 99th percentile isn't very fair. >>> >>> "Over the last month, the median is 2.09 ms, 90th percentile is >>> 20ms, 99th percentile is 47ms." - per >>> https://www.quora.com/What-are-typical-ping-times-between-different-EC2-availability-zones-within-the-same-region >>> >>> Are you using EBS? That would further impact latency on reads and GCs >>> will always cause hiccups in the 99th+. >>> >>> Chris >>> >>> >>> On Mon, Apr 11, 2016 at 7:57 AM, Alessandro Pieri >>> wrote: >>> >>>> Hi everyone, >>>> >>>> Last week I ran some tests to estimate the latency overhead introduces >>>> in a Cassandra cluster by a multi availability zones setup on AWS EC2. >>>> >>>> I started a Cassandra cluster of 6 nodes deployed on 3 different AZs (2 >>>> nodes/AZ). >>>> >>>> Then, I used cassandra-stress to create an INSERT (write) test of 20M >>>> entries with a replication factor = 3, right after, I ran cassandra-stress >>>> again to READ 10M entries. >>>> >>>> Well, I got the following unexpected result: >>>> >>>> Single-AZ, CL=ONE -> median/95th percentile/99th percentile: >>>> 1.06ms/7.41ms/55.81ms >>>> Multi-AZ, CL=ONE -> median/95th percentile/99th percentile: >>>> 1.16ms/38.14ms/47.75ms >>>> >>>> Basically, switching to the multi-AZ setup the latency increased of >>>> ~30ms. That's too much considering the the average network latency between >>>> AZs on AWS is ~1ms. >>>> >>>> Since I couldn't find anything to explain those results, I decided to >>>> run the cassandra-stress specifying only a single node entry (i.e. "--nodes >>>> node1" instead of "--nodes node1,node2,node3,node4,node5,node6") and >>>> surprisingly the latency went back to 5.9 ms. >>>> >>>> Trying to recap: >>>> >>>> Multi-AZ, CL=ONE, "--nodes node1,node2,node3,node4,node5,node6" -> 95th >>>> percentile: 38.14ms >>>> Multi-AZ, CL=ONE, "--nodes node1" -> 95th percentile: 5.9ms >>>> >>>> For the sake of completeness I've ran a further test using a >>>> consistency level = LOCAL_QUORUM and the test did not show any large >>>> variance with using a single node or multiple ones. >>>> >>>> Do you guys know what could be the reason? >>>> >>>> The test were executed on a m3.xlarge (network optimized) using the >>>> DataStax AMI 2.6.3 running Cassandra v2.0.15. >>>> >>>> Thank you in advance for your help. >>>> >>>> Cheers, >>>> Alessandro >>>> >>> >>> >> >> >> -- >> *Alessandro Pieri* >> *Software Architect @ Stream.io Inc* >> e-Mail: alessandro@getstream.io - twitter: sirio7g >> >> >> > --001a114032a08b3baa05304c7c86 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Jack,

As mentioned before I've u= sed m3.xlarge instance types together with two ephemeral disks in raid 0 an= d, according to Amazon, they have "high" network performance.

I ran many tests starting with a brand-new cluster ev= ery time and I got consistent results.

I believe t= here's something that I cannot explain yet with the client used by cass= andra-stress to connect to the nodes, I'd like to understand why there = is such a big difference:

Multi-AZ, CL=3DONE, "--nodes node1,node2,node3,node4,node5,nod= e6" -> 95th percentile: 38.14ms
Multi-AZ, CL=3DONE, "--nodes node1" -> 95th percentile: = 5.9ms

Hope you can help to figure it out.

Cheers,
Alessandro
<= br>



On Tue, Apr 12, 2016 at 5:43 PM, Jack Krupansky &l= t;jack.krupan= sky@gmail.com> wrote:
Which instance type are you using? Some may be throttled for EB= S access, so you could bump into a rate limit, and who knows what AWS will = do at that point.

-- Jack = Krupansky

On Tue, Apr 12, 2016 at 6:02 AM, Alessandro = Pieri <alessandro@getstream.io> wrote:
Thanks Chris for your reply.

I ran the tests 3 times for 20 minutes/each and I monitored the ne= twork latency in the meanwhile, it was very low (even the 99th percentile).=

I didn't notice any cpu spike caused by the G= C but, as you pointed out, I will look into the GC log, just to be sure.

In order to avoid the problem you mentioned with EBS= and to keep the deviation under control I used two ephemeral disks in raid= 0.

I think the odd results come from the way cass= andra-stress deals with multiple nodes. As soon as possible I will go throu= gh the Java code to get some more detail.

If you h= ave something else in your mind please let me know, your comments were real= ly appreciated.

Cheers,
Alessandro=C2=A0=


On Mon, Apr 11, 2016 at 4:15 PM, Chris Lohfink <clo= hfink85@gmail.com> wrote:
<= div dir=3D"ltr">
Where do you get the ~1ms latency between AZs? Compari= ng a short term average to a 99th percentile isn't very fair.

"Over th= e last month, the median is 2.09 ms, 90th percentile is 20ms,=C2=A099th=C2= =A0percentile is = 47ms." - per=C2=A0https://www.quora.com/What-are-typical-ping-times-b= etween-different-EC2-availability-zones-within-the-same-region

=
Are you using EBS? That would further impact latency on reads an= d GCs will always cause hiccups in the 99th+.

Chris

=

On Mon, Apr = 11, 2016 at 7:57 AM, Alessandro Pieri <sirio7g@gmail.com> wr= ote:
Hi everyone,

=
Last week I ran some tests to estimate the = latency overhead introduces in a Cassandra cluster by a multi availability = zones setup on AWS EC2.=C2=A0

I started a Cassandra cluster of 6 nodes = deployed on 3 different AZs (2 nodes/AZ).

Then, I used cassandra-stress= to create an INSERT (write) test of 20M entries with a replication factor = =3D 3, right after, I ran cassandra-stress again to READ 10M entries.
=

W= ell, I got the following unexpected result:=C2=A0

Single-AZ, CL=3DONE -= > median/95th percentile/99th percentile: 1.06ms/7.41ms/55.81ms
Multi-AZ, CL=3DONE ->=C2=A0median/95th percentile/99th percentile: 1.16ms/38.1= 4ms/47.75ms

Basically, switching to the multi-AZ setup the latency increased of ~30ms.= That's too much considering the the average network latency between AZ= s on AWS is ~1ms.

Since I couldn't find anythi= ng to explain those results, I decided to run the cassandra-stress specifyi= ng only a single node entry (i.e. "--nodes node1" instead of &quo= t;--nodes node1,node2,node3,node4,node5,node6") and surprisingly the l= atency went back to 5.9 ms.

Trying to recap:
=

Multi-AZ, CL=3DONE, "--nodes node1,node2,node3,nod= e4,node5,node6" -> 95th percentile: 38.14ms
Multi-AZ,= CL=3DONE, "--nodes node1" -> 95th percentile: 5.9ms

For the sake of completeness I've ran a further test us= ing a consistency level =3D LOCAL_QUORUM and the test did not show any larg= e variance with using a single node or multiple ones.

<= div>Do you guys know what could be the reason?

The= test were executed on a m3.xlarge (network optimized) using the DataStax A= MI 2.6.3 running Cassandra v2.0.15.

Thank you in a= dvance for your help.

Cheers,
Alessandro=




<= /div>--
Alessan= dro Pieri
Software Architec= t @ Stream.io Inc
e-Mail: alessandro@getstream.io - twitter: sirio7g



--001a114032a08b3baa05304c7c86--