cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: Replicate On Write behavior
Date Fri, 09 Sep 2011 07:17:13 GMT
We'll solve #2890 and we should have done it sooner.

That being said, a quick question: how do you do your inserts from the
clients ? Are you evenly
distributing the inserts among the nodes ? Or are you always hitting
the same coordinator ?

Because provided the nodes are correctly distributed on the ring, if
you distribute the inserts
(increment) requests across the nodes (again I'm talking of client
requests), you "should" not
see the behavior you observe.

--
Sylvain

On Thu, Sep 8, 2011 at 9:48 PM, David Hawthorne <dhawth@gmx.3crowd.com> wrote:
> It was exactly due to 2890, and the fact that the first replica is always the one with
the lowest value IP address.  I patched cassandra to pick a random node out of the replica
set in StorageProxy.java findSuitableEndpoint:
>
> Random rng = new Random();
>
> return endpoints.get(rng.nextInt(endpoints.size()));  // instead of return endpoints.get(0);
>
> Now work load is evenly balanced among all 5 nodes and I'm getting 2.5x the inserts/sec
throughput.
>
> Here's the behavior I saw, and "disk work" refers to the ReplicateOnWrite load of a counter
insert:
>
> One node will get RF/n of the disk work.  Two nodes will always get 0 disk work.
>
> in a 3 node cluster, 1 node gets disk hit really hard.  You get the performance of a
one-node cluster.
> in a 6 node cluster, 1 node gets hit with 50% of the disk work, giving you the performance
of ~2 node cluster.
> in a 10 node cluster, 1 node gets 30% of the disk work, giving you the performance of
a ~3 node cluster.
>
> I confirmed this behavior with a 3, 4, and 5 node cluster size.
>
>
>>
>>> On another note, on a 5-node cluster, I'm only seeing 3 nodes with ReplicateOnWrite
Completed tasks in nodetool tpstats output.  Is that normal?  I'm using RandomPartitioner...
>>>
>>> Address         DC          Rack        Status State   Load  
         Owns    Token
>>>                                                    
                       136112946768375385385349842972707284580
>>> 10.0.0.57    datacenter1 rack1       Up     Normal  2.26 GB      
  20.00%  0
>>> 10.0.0.56    datacenter1 rack1       Up     Normal  2.47 GB      
  20.00%  34028236692093846346337460743176821145
>>> 10.0.0.55    datacenter1 rack1       Up     Normal  2.52 GB      
  20.00%  68056473384187692692674921486353642290
>>> 10.0.0.54    datacenter1 rack1       Up     Normal  950.97 MB      
20.00%  102084710076281539039012382229530463435
>>> 10.0.0.72    datacenter1 rack1       Up     Normal  383.25 MB      
20.00%  136112946768375385385349842972707284580
>>>
>>> The nodes with ReplicateOnWrites are the 3 in the middle.  The first node and
last node both have a count of 0.  This is a clean cluster, and I've been doing 3k ... 2.5k
(decaying performance) inserts/sec for the last 12 hours.  The last time this test ran, it
went all the way down to 500 inserts/sec before I killed it.
>>
>> Could be due to https://issues.apache.org/jira//browse/CASSANDRA-2890.
>>
>> --
>> Sylvain
>
>

Mime
View raw message