cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Rules for Major Compaction
Date Tue, 19 Jun 2012 19:30:20 GMT
Hey my favorite question! It is a loaded question and it depends on
your workload. The answer has evolved over time.

In the old days <0.6.5 the only way to remove tombstones was major
compaction. This is not true in any modern version.

(Also in the old days you had to run cleanup to clear hints)

Cassandra now has two compaction strategies SizeTiered and Leveled.
Leveled DB can not be manually compacted.


You final two sentences are good ground rules. In our case we have
some column families that have high churn, for example a gc_grace
period of 4 days but the data is re-written completely every day.
Write activity over time will eventually cause tombstone removal but
we can expedite the process by forcing a major at night. Because the
tables are not really growing the **warning** below does not apply.

**Warning** this creates one large sstable. Which is not always
desirable, because it fiddles with the heuristics of SizeTiered
(having one big table and other smaller ones).

The updated answer is "You probably do not want to run major
compactions, but some use cases could see some benefits"

On Tue, Jun 19, 2012 at 10:51 AM, Raj N <raj.cassandra@gmail.com> wrote:
> DataStax recommends not to run major compactions. Edward Capriolo's
> Cassandra High Performance book suggests that major compaction is a good
> thing. And should be run on a regular basis. Are there any ground rules
> about running major compactions? For example, if you have write-once kind of
> data that is never updated  then it probably makes sense to not run major
> compaction. But if you have data which can be deleted or overwritten does it
> make sense to run major compaction on a regular basis?
>
> Thanks
> -Raj

Mime
View raw message