incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: simple question about merged SSTable sizes
Date Wed, 22 Jun 2011 17:00:04 GMT
Yes, if you are not deleting fast enough they will grow. This is not
specifically a cassandra problem /var/log/messages has the same issue.

There is a JIRA ticket about having a maximum size for SSTables, so they
always stay manageable

You fall into a small trap when you force major compaction in that many
small tables turn into one big one, from their it is hard to get back to
many smaller ones again, the other side of the coin if you do not major
compact you can end up with much more disk usage then live data (IE large %
of disk is overwrites and tombstones).

You can tune the compaction rate now so compaction does not kill your IO.
Generally I think avoiding really large SSTables is the best way to do.
Scale out and avoid very large SSTables/node if possible.

Edward


On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby
<jonathan.colby@gmail.com>wrote:

>
> The way compaction works,  "x" same-sized files are merged into a new
> SSTable.  This repeats itself and the SSTable get bigger and bigger.
>
> So what is the upper limit??     If you are not deleting stuff fast enough,
> wouldn't the SSTable sizes grow indefinitely?
>
> I ask because we have some rather large SSTable files (80-100 GB) and I'm
> starting to worry about future compactions.
>
> Second, compacting such large files is an IO killer.    What can be tuned
> other than compaction_threshold to help optimize this and prevent the files
> from getting too big?
>
> Thanks!

Mime
View raw message