cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Hawthorne <>
Subject Re: Replicate On Write behavior
Date Thu, 08 Sep 2011 19:48:43 GMT
It was exactly due to 2890, and the fact that the first replica is always the one with the
lowest value IP address.  I patched cassandra to pick a random node out of the replica set
in findSuitableEndpoint:

Random rng = new Random();

return endpoints.get(rng.nextInt(endpoints.size()));  // instead of return endpoints.get(0);

Now work load is evenly balanced among all 5 nodes and I'm getting 2.5x the inserts/sec throughput.

Here's the behavior I saw, and "disk work" refers to the ReplicateOnWrite load of a counter

One node will get RF/n of the disk work.  Two nodes will always get 0 disk work.

in a 3 node cluster, 1 node gets disk hit really hard.  You get the performance of a one-node
in a 6 node cluster, 1 node gets hit with 50% of the disk work, giving you the performance
of ~2 node cluster.
in a 10 node cluster, 1 node gets 30% of the disk work, giving you the performance of a ~3
node cluster.

I confirmed this behavior with a 3, 4, and 5 node cluster size.

>> On another note, on a 5-node cluster, I'm only seeing 3 nodes with ReplicateOnWrite
Completed tasks in nodetool tpstats output.  Is that normal?  I'm using RandomPartitioner...
>> Address         DC          Rack        Status State   Load            Owns    Token
>>                                                                            136112946768375385385349842972707284580
>>    datacenter1 rack1       Up     Normal  2.26 GB         20.00%  0
>>    datacenter1 rack1       Up     Normal  2.47 GB         20.00%  34028236692093846346337460743176821145
>>    datacenter1 rack1       Up     Normal  2.52 GB         20.00%  68056473384187692692674921486353642290
>>    datacenter1 rack1       Up     Normal  950.97 MB       20.00%  102084710076281539039012382229530463435
>>    datacenter1 rack1       Up     Normal  383.25 MB       20.00%  136112946768375385385349842972707284580
>> The nodes with ReplicateOnWrites are the 3 in the middle.  The first node and last
node both have a count of 0.  This is a clean cluster, and I've been doing 3k ... 2.5k (decaying
performance) inserts/sec for the last 12 hours.  The last time this test ran, it went all
the way down to 500 inserts/sec before I killed it.
> Could be due to
> --
> Sylvain

View raw message