cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Olefir <solf.li...@gmail.com>
Subject Re: Cassandra counters replication uses more traffic than client increments?
Date Tue, 08 Jan 2013 10:36:50 GMT
So with the holidays hopefully being over, I thought I'd ask again :)

Could someone please help with answers to the two questions:
- Is it reasonable to expect that cross-datacenter node-to-node replication
traffic is greater than actual client-to-server traffic that generates this
activity? Specifically talking about counter increments.
- Is there anything that can be done to lower the amount of cross-datacenter
replication traffic while keeping actual replication going (i.e. we can't
afford to not replicate data, but we can afford e.g. delays in replication)?

Best regards,
Sergey


Sergey Olefir wrote
> Hi,
> 
> as part of our ongoing tests with Cassandra, we've tried to evaluate the
> amount of traffic generated in client-to-server and server-to-server
> (replication scenarios).
> 
> The results we are getting are surprising.
> 
> Our setup:
> - Cassandra 1.1.7.
> - 3 DC with 2 nodes each.
> - NetworkTopology replication strategy with 2 replicas per DC (so
> basically each node contains full data set).
> - 100 clients concurrently incrementing counters at the rate of the
> roughly 100 / second (i.e. about 10k increments per second). Clients
> perform writes to DC:1 only. server-to-server traffic measurement was done
> in DC:2.
> - Clients use batches to write to the server (up to 100 increments per
> batch, overall each client writes 1 or 2 batches per second).
> 
> Clients are Java-based accessing Cassandra via hector. Run on Windows box.
> 
> Traffic measurement for clients (on Windows) was done via Resource Monitor
> and packet capture via Network Monitor. The overall traffic appears to be
> roughly 700KB/sec (kilobytes) for ~10000 increments).
> 
> Traffic measurement for server-to-server was done on DC:2 via packet
> capture. This capture specifically included only nodes in other
> datacenters (so no internal DC traffic was captured).
> 
> The vast majority of traffic was directed to one node DC:2-1. DC2-2
> received like 1/30 of the traffic. I think I've read somewhere that
> Cassandra directs DC-to-DC traffic to one node, so this makes sense.
> 
> What is surprising though -- is the amount of traffic. It looks to be
> roughly twice the amount of the total traffic generated by clients, i.e.
> something like 1.5MB/sec (megabytes). Note -- this only counts incoming
> traffic.
> 
> I've taken a look at some of the captured packets and it looks like
> there's much more service information in DC-to-DC traffic compared to
> client-to-server traffic -- although I am by no means certain here.
> 
> 
> Overall I have a couple of questions:
> - Is it indeed the case that server-to-server replication traffic can be
> significantly more bloated than client-to-server traffic? Or do I need to
> review my testing methodology?
> - Is there anything that can be done to reduce cross-DC replication
> traffic? Perhaps some compression scheme? Or some delay before replication
> allowing for possibly more increments to be merged together?
> 
> 
> Best regards,
> Sergey





--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-counters-replication-uses-more-traffic-than-client-increments-tp7584412p7584620.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Mime
View raw message