cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramzi Rabah (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-604) Compactions might remove tombstones without removing the actual data
Date Sun, 06 Dec 2009 18:04:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786662#action_12786662
] 

Ramzi Rabah commented on CASSANDRA-604:
---------------------------------------

Thinking about it some more, there is another case when data can be lost. In the above case
the file containing the tombstone was compacted by itself before the data file.
The second case is that the file containing the data is compacted by itself before the tombstone
is compacted. 

So in both cases, it seems like the only viable solution I can think of, is to only remove
the tombstones when every single SSTable file for the column family is compacted (I.E. major
compaction). Otherwise, the tombstone should stick around.

Does that make sense?

> Compactions might remove tombstones without removing the actual data
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-604
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-604
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>         Environment: Cent-OS
>            Reporter: Ramzi Rabah
>             Fix For: 0.5
>
>
> I was looking at the code for compaction, and noticed that when we are doing compactions
during the normal course of
> Cassandra, we call:
>            for (List<SSTableReader> sstables :
> getCompactionBuckets(ssTables_, 50L * 1024L * 1024L))
>            {
>                if (sstables.size() < minThreshold)
>                {
>                    continue;
>                }
>                other wise docompactions...
> where getCompactionBuckets puts in buckets very small files, or files
> that are 0.5-1.5 of each other's sizes. It will only compact those if
> they are >= minimum threshold which is 4 by default.
> So far so good. Now how about this scenario, I have an old entry that
> I inserted long time ago and that was compacted into a 75MB file.
> There are fewer 75MB files than 4. I do many deletes, and I end with 4
> extra sstable files filled with tombstones, each about 300 MB large.
> These 4 files are compacted together and in the compaction code, if
> the tombstone is there we don't copy it over to the new file. Now
> since we did not compact the 75MB files, but we compacted the
> tombstone files, that leaves us with the tombstone gone, but
> the data still intact in the 75MB file. If we compacted all the
> files together I don't think that would be a problem, but since we
> only compact 4, this potentially leaves data not cleaned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message