cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramzi Rabah (JIRA)" <j...@apache.org>
Subject [jira] Created: (CASSANDRA-604) Compactions might remove tombstones without removing the actual data
Date Sat, 05 Dec 2009 00:13:20 GMT
Compactions might remove tombstones without removing the actual data
--------------------------------------------------------------------

                 Key: CASSANDRA-604
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-604
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.5
         Environment: Cent-OS
            Reporter: Ramzi Rabah
             Fix For: 0.5


I was looking at the code for compaction, and noticed that when we are doing compactions during
the normal course of
Cassandra, we call:

           for (List<SSTableReader> sstables :
getCompactionBuckets(ssTables_, 50L * 1024L * 1024L))
           {
               if (sstables.size() < minThreshold)
               {
                   continue;
               }
               other wise docompactions...

where getCompactionBuckets puts in buckets very small files, or files
that are 0.5-1.5 of each other's sizes. It will only compact those if
they are >= minimum threshold which is 4 by default.
So far so good. Now how about this scenario, I have an old entry that
I inserted long time ago and that was compacted into a 75MB file.
There are fewer 75MB files than 4. I do many deletes, and I end with 4
extra sstable files filled with tombstones, each about 300 MB large.
These 4 files are compacted together and in the compaction code, if
the tombstone is there we don't copy it over to the new file. Now
since we did not compact the 75MB files, but we compacted the
tombstone files, that leaves us with the tombstone gone, but
the data still intact in the 75MB file. If we compacted all the
files together I don't think that would be a problem, but since we
only compact 4, this potentially leaves data not cleaned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message