cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sheng Chen <>
Subject Re: Compaction doubles disk space
Date Thu, 31 Mar 2011 06:39:20 GMT
It really helps. Thank you very much.


2011/3/30 aaron morton <>

> When a compaction need to write a file cassandra will try to find a place
> to put the new file, based on an estimate of it's size. If it cannot find
> enough space it will trigger a GC which will delete any previously compacted
> and so unneeded SSTables. The same thing will happen when a new SSTable
> needs to be written to disk.
> Minor Compaction groups the SSTables on disk into buckets of similar sizes
> ( each bucket is
> processed in it's own compaction task. Under 0.7 compaction is single
> threaded and when each compaction task starts it will try to find space on
> disk and if necessary trigger GC to free space.
> SSTables are immutable on disk, compaction cannot delete data from them as
> they are also used to serve read requests at the same time. To do so would
> require locking around (regions of) the file.
> Also as far as I understand we cannot immediately delete files because
> other operations (including repair) may be using them. The data in the pre
> compacted files is just as correct as the data in the compacted file, it's
> just more compact. So the easiest thing to do is let the JVM sort out if
> anything else is using them.
> Perhaps it could be improved by actively tracking which files are in use so
> they may be deleted quicker. But right so long as unused space is freed when
> needed it's working as designed AFAIK.
> Thats my understanding, hope it helps explain why it works that way.
> Aaron
> On 30 Mar 2011, at 13:32, Sheng Chen wrote:
> Yes.
> I think at least we can remove the tombstones for each sstable first, and
> then do the merge.
> 2011/3/29 Karl Hiramoto <>
>> Would it be possible to improve the current compaction disk space issue by
>>  compacting one only a few SSTables at a time then imediately deleting the
>> old one?  Looking at the logs it seems like deletions of old SSTables are
>> taking longer than necessary.
>> --
>> Karl

View raw message