incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Hawthorne <dha...@gmx.3crowd.com>
Subject Re: Replicate On Write behavior
Date Thu, 08 Sep 2011 19:48:43 GMT
It was exactly due to 2890, and the fact that the first replica is always the one with the
lowest value IP address.  I patched cassandra to pick a random node out of the replica set
in StorageProxy.java findSuitableEndpoint:

Random rng = new Random();

return endpoints.get(rng.nextInt(endpoints.size()));  // instead of return endpoints.get(0);

Now work load is evenly balanced among all 5 nodes and I'm getting 2.5x the inserts/sec throughput.

Here's the behavior I saw, and "disk work" refers to the ReplicateOnWrite load of a counter
insert:

One node will get RF/n of the disk work.  Two nodes will always get 0 disk work.

in a 3 node cluster, 1 node gets disk hit really hard.  You get the performance of a one-node
cluster.
in a 6 node cluster, 1 node gets hit with 50% of the disk work, giving you the performance
of ~2 node cluster.
in a 10 node cluster, 1 node gets 30% of the disk work, giving you the performance of a ~3
node cluster.

I confirmed this behavior with a 3, 4, and 5 node cluster size.


> 
>> On another note, on a 5-node cluster, I'm only seeing 3 nodes with ReplicateOnWrite
Completed tasks in nodetool tpstats output.  Is that normal?  I'm using RandomPartitioner...
>> 
>> Address         DC          Rack        Status State   Load            Owns    Token
>>                                                                            136112946768375385385349842972707284580
>> 10.0.0.57    datacenter1 rack1       Up     Normal  2.26 GB         20.00%  0
>> 10.0.0.56    datacenter1 rack1       Up     Normal  2.47 GB         20.00%  34028236692093846346337460743176821145
>> 10.0.0.55    datacenter1 rack1       Up     Normal  2.52 GB         20.00%  68056473384187692692674921486353642290
>> 10.0.0.54    datacenter1 rack1       Up     Normal  950.97 MB       20.00%  102084710076281539039012382229530463435
>> 10.0.0.72    datacenter1 rack1       Up     Normal  383.25 MB       20.00%  136112946768375385385349842972707284580
>> 
>> The nodes with ReplicateOnWrites are the 3 in the middle.  The first node and last
node both have a count of 0.  This is a clean cluster, and I've been doing 3k ... 2.5k (decaying
performance) inserts/sec for the last 12 hours.  The last time this test ran, it went all
the way down to 500 inserts/sec before I killed it.
> 
> Could be due to https://issues.apache.org/jira//browse/CASSANDRA-2890.
> 
> --
> Sylvain


Mime
View raw message