cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremiah Jordan (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7764) RFC: Range movements will "wake up" previously invisible data
Date Wed, 13 Aug 2014 20:24:12 GMT


Jeremiah Jordan commented on CASSANDRA-7764:

I have definitely seen this bite people before.  I was thinking about this problem and I wonder
if a new type of "cleanup" that instead of dropping the unowned data, anti compacted it into
a "cleaned data" sub folder would help here.  That way you could back up, or even sstableload
the "cleaned data" in case you find something that went wrong.  And with the piece of mind,
you could then run cleanup without worrying about the data loss.  After CASSANDRA-2434 you
shouldn't actually need the old stuff, but I could see it being nice to keep, especially for
paranoid people ;).

> RFC: Range movements will "wake up" previously invisible data
> -------------------------------------------------------------
>                 Key: CASSANDRA-7764
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Rick Branson
> Presumably this has been going on as long as Cassandra has existed, but wanted to capture
it here since it came up in an IRC discussion. This issue will probably show up on any cluster
> Scenario:
> 1) Start with a 3-node cluster, RF=1
> 2) A 4th node is added to the cluster
> 3) Data is deleted on ranges belonging to 4th node
> 4) Wait for GC to clean up some tombstones on 4th node
> 4) 4th node removed from cluster
> 5) Deleted data will reappear since it was dormant on the original 3 nodes
> This could definitely happen in many other situations where dormant data could exist
such as inconsistencies that aren't resolved before range movement, but the case above seemed
the most reasonable to propose as a real-world problem.
> The cleanup operation can be used to get rid of the dormant data, but from my experience
people don't run cleanup unless they're low on disk. It's definitely not a best practice for
data integrity.

This message was sent by Atlassian JIRA

View raw message