Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 21F3BE3A5 for ; Wed, 9 Jan 2013 07:24:51 +0000 (UTC) Received: (qmail 43634 invoked by uid 500); 9 Jan 2013 07:24:48 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 43314 invoked by uid 500); 9 Jan 2013 07:24:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 43289 invoked by uid 99); 9 Jan 2013 07:24:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jan 2013 07:24:47 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sylvain@datastax.com designates 209.85.212.49 as permitted sender) Received: from [209.85.212.49] (HELO mail-vb0-f49.google.com) (209.85.212.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jan 2013 07:24:42 +0000 Received: by mail-vb0-f49.google.com with SMTP id r6so1245574vbi.8 for ; Tue, 08 Jan 2013 23:24:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=ofvST2VZ6q8jwtPwGJS9Kh6rqZrNTwAS3+wK/2ON/i8=; b=TV5nR+OdUx5tiCRBdgkgh/EVnIEZWz51szsk3tpVN9ytgzAHqL5q+E1JxeOlhlzgck bBcnbRGhISsHLAfkabBXMRsaHTEhdYjvWS8exWtzWewJvEGX+mqs4QGKIn0VJUZZcW9I lAsFpex21YDG9MB/1IJ3EGqEhwxDHokfSY97gJkGMqxJ7v9q3M+inApcWlDiylfr1yUw SxbxNXfgFhtqFkYcokKBHkHmZM/xwWXo8ICfDEYUAM5XYG4Yltdz7OjUCQiUYDrkHSsH V2GQqwjdW84tVvIoojyGZy23+CxxKI/uVAAo+3G5PK0pbjjHNyfg+7RUg6sHS2Cri6ph XiYg== MIME-Version: 1.0 Received: by 10.58.48.231 with SMTP id p7mr91769676ven.11.1357716260704; Tue, 08 Jan 2013 23:24:20 -0800 (PST) Received: by 10.59.7.67 with HTTP; Tue, 8 Jan 2013 23:24:20 -0800 (PST) In-Reply-To: <6A6B434C-66F4-4D25-B55F-9AA5F8C1E17A@thelastpickle.com> References: <1356106304813-7584412.post@n2.nabble.com> <1357641410224-7584620.post@n2.nabble.com> <6A6B434C-66F4-4D25-B55F-9AA5F8C1E17A@thelastpickle.com> Date: Wed, 9 Jan 2013 08:24:20 +0100 Message-ID: Subject: Re: Cassandra counters replication uses more traffic than client increments? From: Sylvain Lebresne To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=089e01182b266cd10a04d2d5f328 X-Gm-Message-State: ALoCoQn0wa+N4J8giFk/kAz9AJTLyE16xBy/mHkvQTe4H7qlCapp0c0HOrfAZslS9WOXCTPZDphE X-Virus-Checked: Checked by ClamAV on apache.org --089e01182b266cd10a04d2d5f328 Content-Type: text/plain; charset=ISO-8859-1 Since you're asking about counters, I'll note too that the internal representation of counters is pretty fat. In you RF=2 case, each counter is probably about 64 bytes internally, while on the client side you send only a 8 bytes value for each increment. So I don't think there is anything unexpected in having more traffic server to server than client to client. -- Sylvain On Wed, Jan 9, 2013 at 3:11 AM, aaron morton wrote: > Can you measure the incoming client traffic on the nodes in DC 1 on port > 9160 ? That would be more of an Apples to Apples comparison. > > I've taken a look at some of the captured packets and it looks like > there's much more service information in DC-to-DC traffic compared to > > client-to-server traffic -- although I am by no means certain here. > > In addition to writes the the potential sources of cross DC traffic are > Gossip and Repair. Gossip is pretty light weight (for a 4 node cluster) and > repair only happens if you ask it to. There could also be hints delivered > from DC 1 to DC 2, these would show up in the logs on DC1. > > Of the top of my head the Internal RowMutation serialisation is not too > different to the Thrift mutation messages. > > There is also a message header, it includes: Source IP, an int for the > verb, some overhead for the key/values, the string FORWARD and the > forwarding IP address. > > Compare this to a mutation message: keyspace name, row key, column family > ID (int), column name, value + list/hash overhead. > > So for small single column updates the ratio of overhead to payload is > kind of high. > > - Is it indeed the case that server-to-server replication traffic can be > significantly more bloated than client-to-server traffic? Or do I need to > review my testing methodology? > > The meta data on the inter node messages is pretty static, the bigger the > payloads the lower the ratio of overhead to payload. This is the same as > messages that go between nodes within the same DC. > > - Is there anything that can be done to reduce cross-DC replication > traffic? Perhaps some compression scheme? > > fixed in 1.2 > https://issues.apache.org/jira/browse/CASSANDRA-3127?attachmentOrder=desc > > Cheers > > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 8/01/2013, at 11:36 PM, Sergey Olefir wrote: > > So with the holidays hopefully being over, I thought I'd ask again :) > > Could someone please help with answers to the two questions: > - Is it reasonable to expect that cross-datacenter node-to-node replication > traffic is greater than actual client-to-server traffic that generates this > activity? Specifically talking about counter increments. > - Is there anything that can be done to lower the amount of > cross-datacenter > replication traffic while keeping actual replication going (i.e. we can't > afford to not replicate data, but we can afford e.g. delays in > replication)? > > Best regards, > Sergey > > > Sergey Olefir wrote > > Hi, > > as part of our ongoing tests with Cassandra, we've tried to evaluate the > amount of traffic generated in client-to-server and server-to-server > (replication scenarios). > > The results we are getting are surprising. > > Our setup: > - Cassandra 1.1.7. > - 3 DC with 2 nodes each. > - NetworkTopology replication strategy with 2 replicas per DC (so > basically each node contains full data set). > - 100 clients concurrently incrementing counters at the rate of the > roughly 100 / second (i.e. about 10k increments per second). Clients > perform writes to DC:1 only. server-to-server traffic measurement was done > in DC:2. > - Clients use batches to write to the server (up to 100 increments per > batch, overall each client writes 1 or 2 batches per second). > > Clients are Java-based accessing Cassandra via hector. Run on Windows box. > > Traffic measurement for clients (on Windows) was done via Resource Monitor > and packet capture via Network Monitor. The overall traffic appears to be > roughly 700KB/sec (kilobytes) for ~10000 increments). > > Traffic measurement for server-to-server was done on DC:2 via packet > capture. This capture specifically included only nodes in other > datacenters (so no internal DC traffic was captured). > > The vast majority of traffic was directed to one node DC:2-1. DC2-2 > received like 1/30 of the traffic. I think I've read somewhere that > Cassandra directs DC-to-DC traffic to one node, so this makes sense. > > What is surprising though -- is the amount of traffic. It looks to be > roughly twice the amount of the total traffic generated by clients, i.e. > something like 1.5MB/sec (megabytes). Note -- this only counts incoming > traffic. > > I've taken a look at some of the captured packets and it looks like > there's much more service information in DC-to-DC traffic compared to > client-to-server traffic -- although I am by no means certain here. > > > Overall I have a couple of questions: > - Is it indeed the case that server-to-server replication traffic can be > significantly more bloated than client-to-server traffic? Or do I need to > review my testing methodology? > - Is there anything that can be done to reduce cross-DC replication > traffic? Perhaps some compression scheme? Or some delay before replication > allowing for possibly more increments to be merged together? > > > Best regards, > Sergey > > > > > > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-counters-replication-uses-more-traffic-than-client-increments-tp7584412p7584620.html > Sent from the cassandra-user@incubator.apache.org mailing list archive at > Nabble.com. > > > --089e01182b266cd10a04d2d5f328 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Since you're asking about counters, I'll note too = that the internal representation of counters is pretty fat. In you RF=3D2 c= ase, each counter is probably about 64 bytes internally, while on the clien= t side you send only a 8 bytes value for each increment. So I don't thi= nk there is anything unexpected in having more traffic server to server tha= n client to client.

--
Sylvain


On Wed, Jan 9, 2013 at 3:11 AM, aar= on morton <aaron@thelastpickle.com> wrote:
Can you = measure the incoming client traffic on the nodes in DC 1 on port 9160 ?=A0T= hat would be more of an Apples to Apples comparison.=A0

I've= taken a look at some of the captured packets and it looks like
there= 9;s much more service information in DC-to-DC traffic compared to
clien= t-to-server traffic -- although I am by no means certain here.
=
In addition to writes the the potential sourc= es of cross DC traffic are Gossip and Repair. Gossip is pretty light weight= (for a 4 node cluster) and repair only happens if you ask it to. There cou= ld also be hints delivered from DC 1 to DC 2, these would show up in the lo= gs on DC1.

Of the top of my head the Internal RowMutation serialis= ation is not too different to the Thrift mutation messages. =A0=A0

There is also a message header, it includes: Source IP, an= int for the verb, some overhead for the key/values, the string FORWARD and= the forwarding IP address.=A0

Compare this to a mutation message: keyspace name, row = key, column family ID (int), column name, value + list/hash overhead.
=

So for small single column updates the ratio of overhea= d to payload is kind of high.=A0

- Is it indeed the case that server-to-server replication tr= affic can be
significantly more bloated than client-to-server traffic? O= r do I need to
review my testing methodology?
Th= e meta data on the inter node messages is pretty static, the bigger the pay= loads the lower the ratio of overhead to payload. This is the same as messa= ges that go between nodes within the same DC.=A0

- Is there anything that can be done to reduce cross-DC repl= ication
traffic? Perhaps some compression scheme?
fixed in 1.2
https://issues.apache.= org/jira/browse/CASSANDRA-3127?attachmentOrder=3Ddesc

Cheers


-----------------
Aaron Morton
Freelance Cassandra= Developer
New Zealand


On 8/01/2013, at 11:36 PM, Sergey Olefir <solf.lists@gmail.com> wrot= e:

So with the holidays hopefully being = over, I thought I'd ask again :)

Could someone please help with answers to the two questions:
- Is it= reasonable to expect that cross-datacenter node-to-node replication
tra= ffic is greater than actual client-to-server traffic that generates this activity? Specifically talking about counter increments.
- Is there anyt= hing that can be done to lower the amount of cross-datacenter
replicatio= n traffic while keeping actual replication going (i.e. we can't
afford to not replicate data, but we can afford e.g. delays in replication)= ?

Best regards,
Sergey


Sergey Olefir wrote
Hi,

as part of our ongoing tests with Cassandra, w= e've tried to evaluate the
amount of traffic generated in client-to-server and server-to-server
(re= plication scenarios).

The results we are getting are surprising.
=
Our setup:
- Cassandra 1.1.7.
- 3 DC with 2 nodes each.
- Netw= orkTopology replication strategy with 2 replicas per DC (so
basically each node contains full data set).
- 100 clients concurrently = incrementing counters at the rate of the
roughly 100 / second (i.e. abou= t 10k increments per second). Clients
perform writes to DC:1 only. serve= r-to-server traffic measurement was done
in DC:2.
- Clients use batches to write to the server (up to 100 increme= nts per
batch, overall each client writes 1 or 2 batches per second).
Clients are Java-based accessing Cassandra via hector. Run on Windows = box.

Traffic measurement for clients (on Windows) was done via Resource Moni= tor
and packet capture via Network Monitor. The overall traffic appears = to be
roughly 700KB/sec (kilobytes) for ~10000 increments).

Traff= ic measurement for server-to-server was done on DC:2 via packet
capture. This capture specifically included only nodes in other
datacent= ers (so no internal DC traffic was captured).

The vast majority of t= raffic was directed to one node DC:2-1. DC2-2
received like 1/30 of the = traffic. I think I've read somewhere that
Cassandra directs DC-to-DC traffic to one node, so this makes sense.
What is surprising though -- is the amount of traffic. It looks to be
r= oughly twice the amount of the total traffic generated by clients, i.e.
something like 1.5MB/sec (megabytes). Note -- this only counts incoming
= traffic.

I've taken a look at some of the captured packets and i= t looks like
there's much more service information in DC-to-DC traff= ic compared to
client-to-server traffic -- although I am by no means certain here.

=
Overall I have a couple of questions:
- Is it indeed the case that s= erver-to-server replication traffic can be
significantly more bloated th= an client-to-server traffic? Or do I need to
review my testing methodology?
- Is there anything that can be done to r= educe cross-DC replication
traffic? Perhaps some compression scheme? Or = some delay before replication
allowing for possibly more increments to b= e merged together?


Best regards,
Sergey





--
V= iew this message in context: http:/= /cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-counte= rs-replication-uses-more-traffic-than-client-increments-tp7584412p7584620.h= tml
Sent from the cassandra-user@incubator.apache.org mailing list archive at= Nabble.com.


--089e01182b266cd10a04d2d5f328--