cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-604) Compactions might remove tombstones without removing the actual data
Date Mon, 07 Dec 2009 00:56:18 GMT


Jonathan Ellis commented on CASSANDRA-604:

That makes sense.  I agree that only GCing tombstones during major compactions (or, other
compactions that happen to include all sstables) is the easiest fix.

> Compactions might remove tombstones without removing the actual data
> --------------------------------------------------------------------
>                 Key: CASSANDRA-604
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>         Environment: Cent-OS
>            Reporter: Ramzi Rabah
>             Fix For: 0.5
> I was looking at the code for compaction, and noticed that when we are doing compactions
during the normal course of
> Cassandra, we call:
>            for (List<SSTableReader> sstables :
> getCompactionBuckets(ssTables_, 50L * 1024L * 1024L))
>            {
>                if (sstables.size() < minThreshold)
>                {
>                    continue;
>                }
>                other wise docompactions...
> where getCompactionBuckets puts in buckets very small files, or files
> that are 0.5-1.5 of each other's sizes. It will only compact those if
> they are >= minimum threshold which is 4 by default.
> So far so good. Now how about this scenario, I have an old entry that
> I inserted long time ago and that was compacted into a 75MB file.
> There are fewer 75MB files than 4. I do many deletes, and I end with 4
> extra sstable files filled with tombstones, each about 300 MB large.
> These 4 files are compacted together and in the compaction code, if
> the tombstone is there we don't copy it over to the new file. Now
> since we did not compact the 75MB files, but we compacted the
> tombstone files, that leaves us with the tombstone gone, but
> the data still intact in the 75MB file. If we compacted all the
> files together I don't think that would be a problem, but since we
> only compact 4, this potentially leaves data not cleaned.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message