Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 09AA018072 for ; Fri, 26 Feb 2016 04:12:18 +0000 (UTC) Received: (qmail 67251 invoked by uid 500); 26 Feb 2016 04:12:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 67206 invoked by uid 500); 26 Feb 2016 04:12:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 67196 invoked by uid 99); 26 Feb 2016 04:12:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Feb 2016 04:12:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id C6A4DC0C8A for ; Fri, 26 Feb 2016 04:12:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.979 X-Spam-Level: * X-Spam-Status: No, score=1.979 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=sysdig-com.20150623.gappssmtp.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id mtzcQOdeBLpz for ; Fri, 26 Feb 2016 04:12:13 +0000 (UTC) Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 234255F1E3 for ; Fri, 26 Feb 2016 04:12:13 +0000 (UTC) Received: by mail-wm0-f41.google.com with SMTP id g62so57303735wme.0 for ; Thu, 25 Feb 2016 20:12:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sysdig-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=7nE6nlC97mRTl4KDAwfNzqB2orfsjfqZ0L7UsH8zBgs=; b=0F6gmIDIAsDbr4FhG27Rd1/K2ujjj672hRUKE4bLwMHjKsMi8+BuzVTh1fkW0i8gou 0lIzUmw6KU7JYnbtd7ZrFI3sB3TeVVxR6c9gbU4xMEnDP/epTYqPqhKuNTzi/IOe2azV 5C3WqojnSJWvNhHrgt2TFKbohC4zqoHCeUCw0GIygcCeUwp/MznZGXy9YSizT3LQN7z+ 8Mcyvum7r0cCyY1Z/5fLHOq3FGu8KW/g1lq5G171jmdrWWTTAu2lFK+iGyIB9HoXZBbe DhsIqkP+oLRdhVKWnQgG3kCx5KnSlrfeo0l+gxMr323+dqE+Dz7BQyTEg7g1wOMqicPM fsRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=7nE6nlC97mRTl4KDAwfNzqB2orfsjfqZ0L7UsH8zBgs=; b=Z79p1DOrKz307yX806kQy2jYY3F39MBOFpRxjubpmrCvUv8WIY9hFqAVTBByaD/Po4 mRK8BFCiKVsFw6niDwgvKNnQ2Awf0jGBFYCz8Bye02G7GZrP/sP8FMNTO0sqhzt7qwS5 Qib+giXT3SNJ8tC5f01FDylmAwewpWBBCCR3YTNOh9q0SBcI8rW9Bj0ic/TnqhB1uCCl N8GiSWoSbFaIrcA4BIQDxaLg2LmG2sTWHY53Mt94TvnS0y13Jx0nHvzKdRa/HGe4ZX4n dEhm5r/K+jdTcXuSQeQcVd0TUIg23iRBz6WP6iJoko8BLFIFI03LEeULJsFtmdQNJn3F qF3w== X-Gm-Message-State: AG10YOR0IJurugvll1oPjiTWFinzodiLD7jL7jk7QQlrftSsvtbOFpqYQpfCEL89o+g5+/i2tD+bdTXhmW/TpYeH MIME-Version: 1.0 X-Received: by 10.194.86.68 with SMTP id n4mr55819330wjz.150.1456459932079; Thu, 25 Feb 2016 20:12:12 -0800 (PST) Received: by 10.27.89.138 with HTTP; Thu, 25 Feb 2016 20:12:11 -0800 (PST) In-Reply-To: References: Date: Thu, 25 Feb 2016 20:12:11 -0800 Message-ID: Subject: Re: Unexpected high internode network activity From: Gianluca Borello To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=089e0102f33ae1b0cd052ca480b8 --089e0102f33ae1b0cd052ca480b8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thank you for your reply. To answer your points: - I fully agree on the write volume, in fact my isolated tests confirm your estimation - About the read, I agree as well, but the volume of data is still much higher - I am writing to one single keyspace with RF 3, there's just one keyspace - I am not using any indexes, the column families are very simple - I am aware of the double count, in fact, I measured the traffic on port 9042 at the client side (so just counted once) and I divided by two the traffic on port 7000 as measured on each node (35 GB -> 17.5 GB). All the measurements have been done with iftop with proper bpf filters on the port and the total traffic matches what I see in cloudwatch (divided by two= ) So unfortunately I still don't have any ideas about what's going on and why I'm seeing 17 GB of internode traffic instead of ~ 5-6. On Thursday, February 25, 2016, daemeon reiydelle wrote: > If read & write at quorum then you write 3 copies of the data then return > to the caller; when reading you read one copy (assume it is not on the > coordinator), and 1 digest (because read at quorum is 2, not 3). > > When you insert, how many keyspaces get written to? (Are you using e.g. > inverted indices?) That is my guess, that your db has about 1.8 bytes > written for every byte inserted. > > =E2=80=8BEvery byte you write is counted also as a read (system a sends 1= gb to > system b, so system b receives 1gb). You would not be charged if intra AZ= , > but inter AZ and inter DC will get that double count. > > So, my guess is reverse indexes, and you forgot to include receive and > transmit.=E2=80=8B > =E2=80=8B > > > *.......* > > > > *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872= * > > On Thu, Feb 25, 2016 at 6:51 PM, Gianluca Borello > wrote: > >> Hello, >> >> We have a Cassandra 2.1.9 cluster on EC2 for one of our live >> applications. There's a total of 21 nodes across 3 AWS availability zone= s, >> c3.2xlarge instances. >> >> The configuration is pretty standard, we use the default settings that >> come with the datastax AMI and the driver in our application is configur= ed >> to use lz4 compression. The keyspace where all the activity happens has = RF >> 3 and we read and write at quorum to get strong consistency. >> >> While analyzing our monthly bill, we noticed that the amount of network >> traffic related to Cassandra was significantly higher than expected. Aft= er >> breaking it down by port, it seems like over any given time, the interno= de >> network activity is 6-7 times higher than the traffic on port 9042, wher= eas >> we would expect something around 2-3 times, given the replication factor >> and the consistency level of our queries. >> >> For example, this is the network traffic broken down by port and >> direction over a few minutes, measured as sum of each node: >> >> Port 9042 from client to cluster (write queries): 1 GB >> Port 9042 from cluster to client (read queries): 1.5 GB >> Port 7000: 35 GB, which must be divided by two because the traffic is >> always directed to another instance of the cluster, so that makes it 17.= 5 >> GB generated traffic >> >> The traffic on port 9042 completely matches our expectations, we do abou= t >> 100k write operations writing 10KB binary blobs for each query, and a bi= t >> more reads on the same data. >> >> According to our calculations, in the worst case, when the coordinator o= f >> the query is not a replica for the data, this should generate about (1 + >> 1.5) * 3 =3D 7.5 GB, and instead we see 17 GB, which is quite a lot more= . >> >> Also, hinted handoffs are disabled and nodes are healthy over the period >> of observation, and I get the same numbers across pretty much every time >> window, even including an entire 24 hours period. >> >> I tried to replicate this problem in a test environment so I connected a >> client to a test cluster done in a bunch of Docker containers (same >> parameters, essentially the only difference is the >> GossipingPropertyFileSnitch instead of the EC2 one) and I always get wha= t I >> expect, the amount of traffic on port 7000 is between 2 and 3 times the >> amount of traffic on port 9042 and the queries are pretty much the same >> ones. >> >> Before doing more analysis, I was wondering if someone has an explanatio= n >> on this problem, since perhaps we are missing something obvious here? >> >> Thanks >> >> >> > --089e0102f33ae1b0cd052ca480b8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thank you for your reply.

To answer your points:

- I fully agree on the write volume, in fact my isolated= =C2=A0tests confirm your=C2=A0estimation

- About t= he read, I agree as well, but the volume of data is still much higher
=

- I am writing to one single keyspace with RF 3, there&= #39;s just one keyspace=C2=A0

- I am not using any= indexes, the column families are very simple

- I = am aware of the double count, in fact, I measured the traffic on port 9042 = at the client side (so just counted once) and I divided by two the traffic = on port 7000 as measured on each node (35 GB -> 17.5 GB). All the measur= ements have been done with iftop with proper bpf filters on the port=C2=A0a= nd the total traffic matches what I see in cloudwatch (divided by two)

So unfortunately I still don't have a= ny ideas about what's going on and why I'm seeing 17 GB of internod= e traffic instead of ~ 5-6.=C2=A0

On Thursday, February 25, 2016, da= emeon reiydelle <daemeonr@gmail.co= m> wrote:
If read & write at quorum then you write 3 copies of the d= ata then return to the caller; when reading you read one copy (assume it is= not on the coordinator), and 1 digest (because read at quorum is 2, not 3)= .

When you insert, how many keyspaces get= written to? (Are you using e.g. inverted indices?) That is my guess, that = your db has about 1.8 bytes written for every byte inserted.

=E2=80=8BEvery = byte you write is counted also as a read (system a sends 1gb to system b, s= o system b receives 1gb). You would not be charged if intra AZ, but inter A= Z and inter DC will get that double count.

So, my guess is reverse indexes, and you forgot to include = receive and transmit.=E2=80=8B
=E2=80=8B

=

.......


Daemeon C.= M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872


On Thu, Feb 25, 2016 at 6:51 PM, Gianluca Bo= rello <gianluca@sysdig.com&= gt; wrote:
H= ello,

We have a Cassandra 2.1.9 cluster on EC2 for= one of our live applications. There's a total of 21 nodes across 3 AWS= availability zones, c3.2xlarge instances.

The con= figuration is pretty standard, we use the default settings that come with t= he datastax AMI and the driver in our application is configured to use lz4 = compression. The keyspace where all the activity happens has RF 3 and we re= ad and write at quorum to get strong consistency.

= While analyzing our monthly bill, we noticed that the amount of network tra= ffic related to Cassandra was significantly higher than expected. After bre= aking it down by port, it seems like over any given time, the internode net= work activity is 6-7 times higher than the traffic on port 9042, whereas we= would expect something around 2-3 times, given the replication factor and = the consistency level of our queries.

For example,= this is the network traffic broken down by port and direction over a few m= inutes, measured as sum of each node:

Port 9042 fr= om client to cluster (write queries): 1 GB
Port 9042 from cluster= to client (read queries): 1.5 GB
Port 7000: 35 GB, which must be= divided by two because the traffic is always directed to another instance = of the cluster, so that makes it 17.5 GB generated traffic

The traffic on port 9042 completely matches our expectations, we d= o about 100k write operations writing 10KB binary blobs for each query, and= a bit more reads on the same data.

According to o= ur calculations, in the worst case, when the coordinator of the query is no= t a replica for the data, this should generate about (1 + 1.5) * 3 =3D 7.5 = GB, and instead we see 17 GB, which is quite a lot more.

Also, hinted handoffs are disabled and nodes are healthy over the pe= riod of observation, and I get the same numbers across pretty much every ti= me window, even including an entire 24 hours period.

I tried to replicate this problem in a test environment so I connected a= client to a test cluster done in a bunch of Docker containers (same parame= ters, essentially the only difference is the GossipingPropertyFileSnitch in= stead of the EC2 one) and I always get what I expect, the amount of traffic= on port 7000 is between 2 and 3 times the amount of traffic on port 9042 a= nd the queries are pretty much the same ones.

Befo= re doing more analysis, I was wondering if someone has an explanation on th= is problem, since perhaps we are missing something obvious here?
=
Thanks



--089e0102f33ae1b0cd052ca480b8--