incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Olefir <solf.li...@gmail.com>
Subject Cassandra counters replication uses more traffic than client increments?
Date Fri, 21 Dec 2012 16:11:44 GMT
Hi,

as part of our ongoing tests with Cassandra, we've tried to evaluate the
amount of traffic generated in client-to-server and server-to-server
(replication scenarios).

The results we are getting are surprising.

Our setup:
- Cassandra 1.1.7.
- 3 DC with 2 nodes each.
- NetworkTopology replication strategy with 2 replicas per DC (so basically
each node contains full data set).
- 100 clients concurrently incrementing counters at the rate of the roughly
100 / second (i.e. about 10k increments per second). Clients perform writes
to DC:1 only. server-to-server traffic measurement was done in DC:2.
- Clients use batches to write to the server (up to 100 increments per
batch, overall each client writes 1 or 2 batches per second).

Clients are Java-based accessing Cassandra via hector. Run on Windows box.

Traffic measurement for clients (on Windows) was done via Resource Monitor
and packet capture via Network Monitor. The overall traffic appears to be
roughly 700KB/sec (kilobytes) for ~10000 increments).

Traffic measurement for server-to-server was done on DC:2 via packet
capture. This capture specifically included only nodes in other datacenters
(so no internal DC traffic was captured).

The vast majority of traffic was directed to one node DC:2-1. DC2-2 received
like 1/30 of the traffic. I think I've read somewhere that Cassandra directs
DC-to-DC traffic to one node, so this makes sense.

What is surprising though -- is the amount of traffic. It looks to be
roughly twice the amount of the total traffic generated by clients, i.e.
something like 1.5MB/sec (megabytes). Note -- this only counts incoming
traffic.

I've taken a look at some of the captured packets and it looks like there's
much more service information in DC-to-DC traffic compared to
client-to-server traffic -- although I am by no means certain here.


Overall I have a couple of questions:
- Is it indeed the case that server-to-server replication traffic can be
significantly more bloated than client-to-server traffic? Or do I need to
review my testing methodology?
- Is there anything that can be done to reduce cross-DC replication traffic?
Perhaps some compression scheme? Or some delay before replication allowing
for possibly more increments to be merged together?


Best regards,
Sergey



--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-counters-replication-uses-more-traffic-than-client-increments-tp7584412.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Mime
View raw message