cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: How bad is teh impact of compaction on performance?
Date Sat, 05 Feb 2011 17:19:49 GMT
On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem <potekhin@bnl.gov> wrote:
>
> Just wanted to see if someone with experience in running an actual service
> can advise me:
>
> how often do you run nodetool compact on your nodes? Do you stagger it in
> time, for each node? How badly is performance affected?
>
> I know this all seems too generic but then again no two clusters are created
> equal anyhow. Just wanted to get a feel.
>
> Thanks,
> Maxim
>
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>

This is an interesting topic. Cassandra can now remove tombstones on
non-major compaction. For some use cases you may not have to trigger
nodetool compact yourself to remove tombstones. Use cases that do not
to many updates, deletes may have the least need to run compaction
yourself.

!However! If you have smaller SSTables, or less SSTables your read
operations will be more efficient.

if you have downtime such as from 1AM-6AM. Going through a major
compaction might shrink you dataset significantly and that will make
reads better.

Compaction can be more or less intensive. The largest factor is is row
size.  Users with large rows probably see faster compaction while
smaller rows see it take a long time. You can lower the priority of
the compaction thread for experimentation.

As to the performance you want to get your cluster to the state where
it is not compacting often. This may mean you need more nodes to
handle writes.

I graph the compaction information from JMX
http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp
to get a feel for how often a node is compacting on average. Also I
cross reference the compaction with Read latency and IO graphs I have
to see what impact compaction has on reads.

Forcing a major compaction also lowers the chances a compaction will
happen during the day on peak time. I major compact a few cluster
nodes each night through cron (gc time 3 days). This has been good for
keeping our data on disk as small as possible. Forcing the major
compact at night uses IO, but i find it saves IO over the course of
the day because each read seeks less on disk.

Mime
View raw message