cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adi <>
Subject Re: Calculate number of nodes required based on data
Date Wed, 07 Sep 2011 17:51:04 GMT
On Wed, Sep 7, 2011 at 1:09 PM, Hefeng Yuan <> wrote:

> Adi,
> The reason we're attempting to add more nodes is trying to solve the
> long/simultaneous compactions, i.e. the performance issue, not the storage
> issue yet.
> We have RF 5 and CL QUORUM for read and write, we have currently 6 nodes,
> and when 4 nodes doing compaction at the same period, we're screwed,
> especially on read, since it'll cover one of the compaction node anyways.
> My assumption is that if we add more nodes, each node will have less load,
> and therefore need less compaction, and probably will compact faster,
> eternally avoid 4+ nodes doing compaction simultaneously.
> Any suggestion on how to calculate how many more nodes to add? Or,
> generally how to plan for number of nodes required, from a performance
> perspective?
> Thanks,
> Hefeng
Adding nodes to delay and reduce compaction is an interesting performance
use case :-)  I am thinking you can find a smarter/cheaper way to manage
Have you looked at
a) increasing memtable througput
What is the nature of your writes?  Is it mostly inserts or also has lot of
quick updates of recently inserted data. Increasing memtable_throughput can
delay and maybe reduce the compaction cost if you have lots of updates to
same data.You will have to provide for memory if you try this.
When mentioned "with ~9m serialized bytes" is that the memtable throughput?
That is quite a low threshold which will result in large number of SSTables
needing to be compacted. I think the default is 256 MB and on the lower end
values I have seen are 64 MB or maybe 32 MB.

b) tweaking min_compaction_threshold and max_compaction_threshold
- increasing min_compaction_threshold will delay compactions
- decreasing max_compaction_threshold will reduce number of sstables per
compaction cycle
Are you using the defaults 4-32 or are trying some different values

c) splitting column families
Again splitting column families can also help because compactions occur
serially one CF at a time and that spreads out your compaction cost over
time and column families. It requires change in app logic though.


View raw message