cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From buddhasystem <>
Subject Re: How bad is teh impact of compaction on performance?
Date Sat, 05 Feb 2011 17:48:04 GMT

Thanks Edward. In our usage scenario, there is never downtime, it's a global
24/7 operation.

What is impacted the worst, the read or write?

How does a node handle compaction when there is a spike of writes coming to

Edward Capriolo wrote:
> On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem <> wrote:
>> Just wanted to see if someone with experience in running an actual
>> service
>> can advise me:
>> how often do you run nodetool compact on your nodes? Do you stagger it in
>> time, for each node? How badly is performance affected?
>> I know this all seems too generic but then again no two clusters are
>> created
>> equal anyhow. Just wanted to get a feel.
>> Thanks,
>> Maxim
>> --
>> View this message in context:
>> Sent from the mailing list archive at
> This is an interesting topic. Cassandra can now remove tombstones on
> non-major compaction. For some use cases you may not have to trigger
> nodetool compact yourself to remove tombstones. Use cases that do not
> to many updates, deletes may have the least need to run compaction
> yourself.
> !However! If you have smaller SSTables, or less SSTables your read
> operations will be more efficient.
> if you have downtime such as from 1AM-6AM. Going through a major
> compaction might shrink you dataset significantly and that will make
> reads better.
> Compaction can be more or less intensive. The largest factor is is row
> size.  Users with large rows probably see faster compaction while
> smaller rows see it take a long time. You can lower the priority of
> the compaction thread for experimentation.
> As to the performance you want to get your cluster to the state where
> it is not compacting often. This may mean you need more nodes to
> handle writes.
> I graph the compaction information from JMX
> to get a feel for how often a node is compacting on average. Also I
> cross reference the compaction with Read latency and IO graphs I have
> to see what impact compaction has on reads.
> Forcing a major compaction also lowers the chances a compaction will
> happen during the day on peak time. I major compact a few cluster
> nodes each night through cron (gc time 3 days). This has been good for
> keeping our data on disk as small as possible. Forcing the major
> compact at night uses IO, but i find it saves IO over the course of
> the day because each read seeks less on disk.

View this message in context:
Sent from the mailing list archive at

View raw message