incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hefeng Yuan <hfy...@rhapsody.com>
Subject Re: Calculate number of nodes required based on data
Date Wed, 07 Sep 2011 17:09:42 GMT
Adi,

The reason we're attempting to add more nodes is trying to solve the long/simultaneous compactions,
i.e. the performance issue, not the storage issue yet.
We have RF 5 and CL QUORUM for read and write, we have currently 6 nodes, and when 4 nodes
doing compaction at the same period, we're screwed, especially on read, since it'll cover
one of the compaction node anyways. 
My assumption is that if we add more nodes, each node will have less load, and therefore need
less compaction, and probably will compact faster, eternally avoid 4+ nodes doing compaction
simultaneously.

Any suggestion on how to calculate how many more nodes to add? Or, generally how to plan for
number of nodes required, from a performance perspective?

Thanks,
Hefeng

On Sep 7, 2011, at 9:56 AM, Adi wrote:

> On Tue, Sep 6, 2011 at 3:53 PM, Hefeng Yuan <hfyuan@rhapsody.com> wrote:
> Hi,
> 
> Is there any suggested way of calculating number of nodes needed based on data?
>  
> We currently have 6 nodes (each has 8G memory) with RF5 (because we want to be able to
survive loss of 2 nodes).
> The flush of memtable happens around every 30 min (while not doing compaction), with
~9m serialized bytes.
> 
> The problem is that we see more than 3 nodes doing compaction at the same time, which
slows down the application.
> (tried to increase/decrease compaction_throughput_mb_per_sec, not helping much)
> 
> So I'm thinking probably we should add more nodes, but not sure how many more to add.
> Based on the data rate, is there any suggested way of calculating number of nodes required?
> 
> Thanks,
> Hefeng
> 
> 
> What is the total  amount of data?
> What is the total amount in the biggest column family?
> 
> There is no hard limit per node. Cassandra gurus like more nodes :-). One number for
'happy cassandra users'  I have seen mentioned in discussions is around 250-300 GB per node.
But you could store more per node by having multiple column families each storing around 250-300
GB per column family. The main problem being repair/compactions and such operations taking
longer and requiring much more spare disk space.
> 
> As for slow down in application during compaction I was wondering 
> what is the CL you are using for read and writes?
> Make sure it is not a client issue - Is your client hitting all nodes in round-robin
or some other fashion?
> 
> -Adi


Mime
View raw message