incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: simple question about merged SSTable sizes
Date Wed, 22 Jun 2011 17:22:17 GMT
I would not say avoid major compactions at all cost.

In the old days < 0.6.5 IIRC the only way to clear tombstones was a major
compaction. The nice thing about major compaction is if you have a situation
with 4 SSTables at 2GB each (that is total 8GB). Under normal write
conditions it could be more then gc_grace days before a deleted row gets
cleared from disk. It is hard to exactly say how long before the overwritten
rows will have the duplicates removed.

Even with bloom filters and indexes the fact remains that fewer smaller
tables search faster (truly a less is more scenario). If your force a major
compaction and night and maybe bring this table down to 4GB or 6GB.  It is
now less space on disk. This uses disk bandwidth initially, but once done
your page cache is much more effective. Since compacting this small table
does not take very long it is a win/win. For tables that need to stay small
I might major compact then every other day.

As you pointed out when your SStables get larger the situation becomes less
of a win/win, mostly because it takes much longer longer to compact, so they
get harder to schedule. At that point it is sometimes better to let
compaction run its "natural course" instead of forcing a major.

On Wed, Jun 22, 2011 at 1:03 PM, Jonathan Colby <>wrote:

> So the take-away is try to avoid major compactions at all costs!   Thanks
> Ed and Eric.
> On Jun 22, 2011, at 7:00 PM, Edward Capriolo wrote:
> Yes, if you are not deleting fast enough they will grow. This is not
> specifically a cassandra problem /var/log/messages has the same issue.
> There is a JIRA ticket about having a maximum size for SSTables, so they
> always stay manageable
> You fall into a small trap when you force major compaction in that many
> small tables turn into one big one, from their it is hard to get back to
> many smaller ones again, the other side of the coin if you do not major
> compact you can end up with much more disk usage then live data (IE large %
> of disk is overwrites and tombstones).
> You can tune the compaction rate now so compaction does not kill your IO.
> Generally I think avoiding really large SSTables is the best way to do.
> Scale out and avoid very large SSTables/node if possible.
> Edward
> On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby <
> > wrote:
>> The way compaction works,  "x" same-sized files are merged into a new
>> SSTable.  This repeats itself and the SSTable get bigger and bigger.
>> So what is the upper limit??     If you are not deleting stuff fast
>> enough, wouldn't the SSTable sizes grow indefinitely?
>> I ask because we have some rather large SSTable files (80-100 GB) and I'm
>> starting to worry about future compactions.
>> Second, compacting such large files is an IO killer.    What can be tuned
>> other than compaction_threshold to help optimize this and prevent the files
>> from getting too big?
>> Thanks!

View raw message