cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hefeng Yuan <>
Subject Re: Calculate number of nodes required based on data
Date Wed, 07 Sep 2011 18:09:55 GMT
We didn't change MemtableThroughputInMB/min/maxCompactionThreshold, they're 499/4/32.
As for why we're flushing at ~9m, I guess it has to do with this:
The only parameter I tried to play with is the compaction_throughput_mb_per_sec, tried cutting
it in half and doubled, seems none of them helps avoiding the simultaneous compactions on

I agree that we don't necessarily need to add node, as long as we have a way to avoid simultaneous
compaction on 4+ nodes.


On Sep 7, 2011, at 10:51 AM, Adi wrote:

> On Wed, Sep 7, 2011 at 1:09 PM, Hefeng Yuan <> wrote:
> Adi,
> The reason we're attempting to add more nodes is trying to solve the long/simultaneous
compactions, i.e. the performance issue, not the storage issue yet.
> We have RF 5 and CL QUORUM for read and write, we have currently 6 nodes, and when 4
nodes doing compaction at the same period, we're screwed, especially on read, since it'll
cover one of the compaction node anyways. 
> My assumption is that if we add more nodes, each node will have less load, and therefore
need less compaction, and probably will compact faster, eternally avoid 4+ nodes doing compaction
> Any suggestion on how to calculate how many more nodes to add? Or, generally how to plan
for number of nodes required, from a performance perspective?
> Thanks,
> Hefeng
> Adding nodes to delay and reduce compaction is an interesting performance use case :-)
 I am thinking you can find a smarter/cheaper way to manage that.
> Have you looked at 
> a) increasing memtable througput
> What is the nature of your writes?  Is it mostly inserts or also has lot of quick updates
of recently inserted data. Increasing memtable_throughput can delay and maybe reduce the compaction
cost if you have lots of updates to same data.You will have to provide for memory if you try
> When mentioned "with ~9m serialized bytes" is that the memtable throughput? That is quite
a low threshold which will result in large number of SSTables needing to be compacted. I think
the default is 256 MB and on the lower end values I have seen are 64 MB or maybe 32 MB.
> b) tweaking min_compaction_threshold and max_compaction_threshold
> - increasing min_compaction_threshold will delay compactions
> - decreasing max_compaction_threshold will reduce number of sstables per compaction cycle
> Are you using the defaults 4-32 or are trying some different values
> c) splitting column families
> Again splitting column families can also help because compactions occur serially one
CF at a time and that spreads out your compaction cost over time and column families. It requires
change in app logic though.
> -Adi

View raw message