cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramzi Rabah (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-604) Compactions might remove tombstones without removing the actual data
Date Mon, 07 Dec 2009 00:34:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786743#action_12786743
] 

Ramzi Rabah commented on CASSANDRA-604:
---------------------------------------

What I meant was there are 2 orders of compactions that might happen that would lead to the
clean up of tombstones but not data:

Case 1) I described first, data is in sstable 1, tombstone in sstable 2. sstables 2-5 are
compacted producing sstable 6, and sstable 6 has no tombstone. That leaves us with sstable
1 data, sstable 6 no tombstone --> Probably bad. In this case tombstones are compacted
before the data is ever compacted.

Case 2) data is in sstable 1, tombstone is in sstable 5. sstable 1-4 get compacted to sstable
6 that has data, so now we have sstable 5 tombstone, sstable 6 data. sstable 5,7,8,9 are compacted
producing sstable 10. sstable 6 data, sstable 10 no tombstone --> Probably bad.  In this
case, data was compacted first, then tombstones. 

Both cases can probably be fixed in the same manner, but just wanted to point out all scenarios
which I can think of that can cause this problem. 




> Compactions might remove tombstones without removing the actual data
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-604
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-604
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>         Environment: Cent-OS
>            Reporter: Ramzi Rabah
>             Fix For: 0.5
>
>
> I was looking at the code for compaction, and noticed that when we are doing compactions
during the normal course of
> Cassandra, we call:
>            for (List<SSTableReader> sstables :
> getCompactionBuckets(ssTables_, 50L * 1024L * 1024L))
>            {
>                if (sstables.size() < minThreshold)
>                {
>                    continue;
>                }
>                other wise docompactions...
> where getCompactionBuckets puts in buckets very small files, or files
> that are 0.5-1.5 of each other's sizes. It will only compact those if
> they are >= minimum threshold which is 4 by default.
> So far so good. Now how about this scenario, I have an old entry that
> I inserted long time ago and that was compacted into a 75MB file.
> There are fewer 75MB files than 4. I do many deletes, and I end with 4
> extra sstable files filled with tombstones, each about 300 MB large.
> These 4 files are compacted together and in the compaction code, if
> the tombstone is there we don't copy it over to the new file. Now
> since we did not compact the 75MB files, but we compacted the
> tombstone files, that leaves us with the tombstone gone, but
> the data still intact in the 75MB file. If we compacted all the
> files together I don't think that would be a problem, but since we
> only compact 4, this potentially leaves data not cleaned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message