cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Colby <jonathan.co...@gmail.com>
Subject Re: simple question about merged SSTable sizes
Date Wed, 22 Jun 2011 17:03:16 GMT
So the take-away is try to avoid major compactions at all costs!   Thanks Ed and Eric.

On Jun 22, 2011, at 7:00 PM, Edward Capriolo wrote:

> Yes, if you are not deleting fast enough they will grow. This is not specifically a cassandra
problem /var/log/messages has the same issue. 
> 
> There is a JIRA ticket about having a maximum size for SSTables, so they always stay
manageable
> 
> You fall into a small trap when you force major compaction in that many small tables
turn into one big one, from their it is hard to get back to many smaller ones again, the other
side of the coin if you do not major compact you can end up with much more disk usage then
live data (IE large % of disk is overwrites and tombstones).
> 
> You can tune the compaction rate now so compaction does not kill your IO. Generally I
think avoiding really large SSTables is the best way to do. Scale out and avoid very large
SSTables/node if possible.
> 
> Edward
> 
> 
> On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby <jonathan.colby@gmail.com> wrote:
> 
> The way compaction works,  "x" same-sized files are merged into a new SSTable.  This
repeats itself and the SSTable get bigger and bigger.
> 
> So what is the upper limit??     If you are not deleting stuff fast enough, wouldn't
the SSTable sizes grow indefinitely?
> 
> I ask because we have some rather large SSTable files (80-100 GB) and I'm starting to
worry about future compactions.
> 
> Second, compacting such large files is an IO killer.    What can be tuned other than
compaction_threshold to help optimize this and prevent the files from getting too big?
> 
> Thanks!
> 


Mime
View raw message