incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Compaction doubles disk space
Date Wed, 30 Mar 2011 07:08:53 GMT
When a compaction need to write a file cassandra will try to find a place to put the new file,
based on an estimate of it's size. If it cannot find enough space it will trigger a GC which
will delete any previously compacted and so unneeded SSTables. The same thing will happen
when a new SSTable needs to be written to disk. 

Minor Compaction groups the SSTables on disk into buckets of similar sizes (http://wiki.apache.org/cassandra/MemtableSSTable)
each bucket is processed in it's own compaction task. Under 0.7 compaction is single threaded
and when each compaction task starts it will try to find space on disk and if necessary trigger
GC to free space. 
 
SSTables are immutable on disk, compaction cannot delete data from them as they are also used
to serve read requests at the same time. To do so would require locking around (regions of)
the file.  

Also as far as I understand we cannot immediately delete files because other operations (including
repair) may be using them. The data in the pre compacted files is just as correct as the data
in the compacted file, it's just more compact. So the easiest thing to do is let the JVM sort
out if anything else is using them. 

Perhaps it could be improved by actively tracking which files are in use so they may be deleted
quicker. But right so long as unused space is freed when needed it's working as designed AFAIK.


Thats my understanding, hope it helps explain why it works that way. 
Aaron

On 30 Mar 2011, at 13:32, Sheng Chen wrote:

> Yes.
> I think at least we can remove the tombstones for each sstable first, and then do the
merge.
> 
> 2011/3/29 Karl Hiramoto <karl@hiramoto.org>
> Would it be possible to improve the current compaction disk space issue by  compacting
one only a few SSTables at a time then imediately deleting the old one?  Looking at the logs
it seems like deletions of old SSTables are taking longer than necessary.
> 
> --
> Karl
> 


Mime
View raw message