cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Manual Compaction in Production
Date Tue, 09 Nov 2010 02:23:11 GMT
On Mon, Nov 8, 2010 at 6:16 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> After Sylvain added support for removing tombstones during minor
> compactions in 0.6.6 (see
> http://www.riptano.com/blog/whats-new-cassandra-066), doing major
> compactions should be considered unnecessary until otherwise
> demonstrated for your workload.  (If you happen to have weekly slow
> periods into which major compaction fits conveniently, then great, it
> won't hurt things, but otherwise, leave it off.)
>
> On Mon, Nov 8, 2010 at 4:07 PM, Wayne <wav100@gmail.com> wrote:
>> Can anyone speak to best practices for running manual compaction in
>> production? Our assumption is that without it the sstables will become too
>> fragmented...is this an accepted "fact"? Obviously it depends on the volume
>> of writes, but I am looking for current production practices.
>>
>> Since it takes a lot of resources and 4-5 hours for our current node size of
>> 500Gb weekly seems like a sensible option for us. Is this a normal practice?
>>
>> Is it best to run on all nodes at the same time or staggered across nodes to
>> reduce total cluster slow-down? Given that full compaction has a major
>> affect on a node and its ability to function under heavy load our assumption
>> is that staggered over the weekend for example (our low usage time) would be
>> best.
>>
>> Any recommendations?
>>
>> Thanks
>>
>> Wayne
>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

I am using a build with support for removing tombstones during minor
compacts. I am pretty happy to see SSTables shrink during non-major
compactions. If I understand correctly bloomfilters have false
positives, so a key may appear to be in other SSTables and not be
removed by minor compaction.

Also I have no data to back this up, but when nodes get multiple GB of
data , ~400 GB but the daily data inserted is ~1GB/day. It may be many
days from the time delete request until the time the SSTables with the
key gets even minor compacted.

Wouldn't these two scenarios (and possibly others) still require major
compaction to bring you down to the lowest possible disk utilization?

Edward

Mime
View raw message