cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: How bad is teh impact of compaction on performance?
Date Sat, 05 Feb 2011 18:07:46 GMT
On Sat, Feb 5, 2011 at 12:48 PM, buddhasystem <> wrote:
> Thanks Edward. In our usage scenario, there is never downtime, it's a global
> 24/7 operation.
> What is impacted the worst, the read or write?
> How does a node handle compaction when there is a spike of writes coming to
> it?
> Edward Capriolo wrote:
>> On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem <> wrote:
>>> Just wanted to see if someone with experience in running an actual
>>> service
>>> can advise me:
>>> how often do you run nodetool compact on your nodes? Do you stagger it in
>>> time, for each node? How badly is performance affected?
>>> I know this all seems too generic but then again no two clusters are
>>> created
>>> equal anyhow. Just wanted to get a feel.
>>> Thanks,
>>> Maxim
>>> --
>>> View this message in context:
>>> Sent from the mailing list archive at
>> This is an interesting topic. Cassandra can now remove tombstones on
>> non-major compaction. For some use cases you may not have to trigger
>> nodetool compact yourself to remove tombstones. Use cases that do not
>> to many updates, deletes may have the least need to run compaction
>> yourself.
>> !However! If you have smaller SSTables, or less SSTables your read
>> operations will be more efficient.
>> if you have downtime such as from 1AM-6AM. Going through a major
>> compaction might shrink you dataset significantly and that will make
>> reads better.
>> Compaction can be more or less intensive. The largest factor is is row
>> size.  Users with large rows probably see faster compaction while
>> smaller rows see it take a long time. You can lower the priority of
>> the compaction thread for experimentation.
>> As to the performance you want to get your cluster to the state where
>> it is not compacting often. This may mean you need more nodes to
>> handle writes.
>> I graph the compaction information from JMX
>> to get a feel for how often a node is compacting on average. Also I
>> cross reference the compaction with Read latency and IO graphs I have
>> to see what impact compaction has on reads.
>> Forcing a major compaction also lowers the chances a compaction will
>> happen during the day on peak time. I major compact a few cluster
>> nodes each night through cron (gc time 3 days). This has been good for
>> keeping our data on disk as small as possible. Forcing the major
>> compact at night uses IO, but i find it saves IO over the course of
>> the day because each read seeks less on disk.
> --
> View this message in context:
> Sent from the mailing list archive at

It does not have to be downtime. It just has to be a slow time. Use
your traffic graphs to run major compact at the slowest time so it is
least impacting on performance.

Compaction does not generally effect writes or busts or writes,
especially if your writes go to a separate commit log disk.

In the best case scenario compaction may not effect your performance
at all. An example of this would be if your use case is near 100%
reads are serviced by row cache disk is not a factor.

Generally speaking if you have good fast hard disks, and only a single
node is compacting at a given time the cluster absorbs this. In 0.7.0
dynamic snitch should help re-route traffic away from slower nodes for
even less impact. In other words, making compaction "non impacting" is
all about capacity.

View raw message