incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adi <adi.pan...@gmail.com>
Subject Re: Calculate number of nodes required based on data
Date Wed, 07 Sep 2011 16:56:05 GMT
On Tue, Sep 6, 2011 at 3:53 PM, Hefeng Yuan <hfyuan@rhapsody.com> wrote:

> Hi,
>
> Is there any suggested way of calculating number of nodes needed based on
> data?
>

We currently have 6 nodes (each has 8G memory) with RF5 (because we want to
> be able to survive loss of 2 nodes).
> The flush of memtable happens around every 30 min (while not doing
> compaction), with ~9m serialized bytes.
>
> The problem is that we see more than 3 nodes doing compaction at the same
> time, which slows down the application.
> (tried to increase/decrease compaction_throughput_mb_per_sec, not helping
> much)
>
> So I'm thinking probably we should add more nodes, but not sure how many
> more to add.
> Based on the data rate, is there any suggested way of calculating number of
> nodes required?
>
> Thanks,
> Hefeng



What is the total  amount of data?
What is the total amount in the biggest column family?

There is no hard limit per node. Cassandra gurus like more nodes :-). One
number for 'happy cassandra users'  I have seen mentioned in discussions is
around 250-300 GB per node. But you could store more per node by having
multiple column families each storing around 250-300 GB per column family.
The main problem being repair/compactions and such operations taking longer
and requiring much more spare disk space.

As for slow down in application during compaction I was wondering
what is the CL you are using for read and writes?
Make sure it is not a client issue - Is your client hitting all nodes in
round-robin or some other fashion?

-Adi

Mime
View raw message