cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Podkowinski (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
Date Thu, 26 Jan 2017 12:07:25 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839622#comment-15839622
] 

Stefan Podkowinski commented on CASSANDRA-13153:
------------------------------------------------

Thanks reporting this, [~Amanda.Debrot]! Let me try to wrap-up again what's happending here..

I think the assumption was that anti-compaction will isolate repaired ranges into the repaired
set of sstables, while parts of sstables not covered by the repair will stay in the unrepaired
set. As described by Amanda, trouble starts when anti-compaction is taking place exclusively
on already repaired sstables. Once we've finished repairing a certain range using full repair,
anti-compaction will move unaffected ranges in overlapping sstables from the repaired into
unrepaired set again, even if ranges have actually already been repaired before. As the overlap
between ranges and sstables is non-deterministic, we could either see regular cells, tombstones
or both being move to unrepaired, based on whether the sstable happens to overlap or not.


Unfortunately this is not the only way that this could happen. As described in CASSANDRA-9143,
compactions during the repairs can prevent anti-compaction for individual sstables and tombstones
and data could end up in different sets in this case as well. 

bq.  I've only tested it on Cassandra version 2.2 but it most likely also affects all Cassandra
versions with incremental repair - like 2.1 and 3.0.

I think 2.1 should not be affected, as we started doing anti-compactions for full repairs
in 2.2.

> Reappeared Data when Mixing Incremental and Full Repairs
> --------------------------------------------------------
>
>                 Key: CASSANDRA-13153
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction, Tools
>         Environment: Apache Cassandra 2.2
>            Reporter: Amanda Debrot
>              Labels: Cassandra
>         Attachments: log-Reappeared-Data.txt, Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and SizeTieredCompactionStrategy.  I've
only tested it on Cassandra version 2.2 but it most likely also affects all Cassandra versions
with incremental repair - like 2.1 and 3.0.
> When mixing incremental and full repairs, there are a few scenarios where the Data SSTable
is marked as unrepaired and the Tombstone SSTable is marked as repaired.  Then if it is past
gc_grace, and the tombstone and data has been compacted out on other replicas, the next incremental
repair will push the Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
> 	Node 1 has data and tombstone in separate SSTables.
> 	Node 2 has data and no tombstone.
> 	Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have tombstone on
each node.
> Some minor compactions have happened since so data and tombstone get merged to 1 SSTable
on Nodes 1 and 3.
> 	Node 1 had a minor compaction that merged data with tombstone. 1 SSTable with tombstone.
> 	Node 2 has data and tombstone in separate SSTables.
> 	Node 3 had a minor compaction that merged data with tombstone. 1 SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as "Unrepaired" while
Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
>         Since the Data and Tombstone SSTable have been marked as "Repaired" and anticompacted,
they have had minor compactions with other SSTables containing keys from other ranges.  During
full repair, if the last node to run it doesn't own this particular key in it's partitioner
range, the Data and Tombstone SSTable will get anticompacted and marked as "Unrepaired". 
Now in the next incremental repair, if the Data SSTable is involved in a minor compaction
during the repair but the Tombstone SSTable is not, the resulting compacted SSTable will be
marked "Unrepaired" and Tombstone SSTable is marked "Repaired".
> Scenario 2:
>         Only the Data SSTable had minor compaction with other SSTables containing keys
from other ranges after being marked as "Repaired".  The Tombstone SSTable was never involved
in a minor compaction so therefore all keys in that SSTable belong to 1 particular partitioner
range. During full repair, if the last node to run it doesn't own this particular key in it's
partitioner range, the Data SSTable will get anticompacted and marked as "Unrepaired".   The
Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that key,
the tombstone will get compacted out.
> 	Node 1 has nothing.
> 	Node 2 has data (in unrepaired SSTable) and tombstone (in repaired SSTable) in separate
SSTables.
> 	Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable to build
the merkle tree since the tombstone SSTable is flagged as repaired and data SSTable is marked
as unrepaired.  And the data will get repaired against the other two nodes.
> 	Node 1 has data.
> 	Node 2 has data and tombstone in separate SSTables.
> 	Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 2, or 2 and
3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message