Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Thu, 26 Jan 2017 12:07:25 +0000 (UTC)
From: "Stefan Podkowinski (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.13037722.1485357230000.2264.1485432445725@Atlassian.JIRA>
In-Reply-To: <JIRA.13037722.1485357230000@Atlassian.JIRA>
References: <JIRA.13037722.1485357230000@Atlassian.JIRA> <JIRA.13037722.1485357230049@jira-lw-us.apache.org>
Subject: [jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing
 Incremental and Full Repairs
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Thu, 26 Jan 2017 12:07:32 -0000


    [ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=3Dcom.atla=
ssian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=
=3D15839622#comment-15839622 ]=20

Stefan Podkowinski commented on CASSANDRA-13153:
------------------------------------------------

Thanks reporting this, [~Amanda.Debrot]! Let me try to wrap-up again what's=
 happending here..

I think the assumption was that anti-compaction will isolate repaired range=
s into the repaired set of sstables, while parts of sstables not covered by=
 the repair will stay in the unrepaired set. As described by Amanda, troubl=
e starts when anti-compaction is taking place exclusively on already repair=
ed sstables. Once we've finished repairing a certain range using full repai=
r, anti-compaction will move unaffected ranges in overlapping sstables from=
 the repaired into unrepaired set again, even if ranges have actually alrea=
dy been repaired before. As the overlap between ranges and sstables is non-=
deterministic, we could either see regular cells, tombstones or both being =
move to unrepaired, based on whether the sstable happens to overlap or not.=
=20

Unfortunately this is not the only way that this could happen. As described=
 in CASSANDRA-9143, compactions during the repairs can prevent anti-compact=
ion for individual sstables and tombstones and data could end up in differe=
nt sets in this case as well.=20

bq.  I've only tested it on Cassandra version 2.2 but it most likely also a=
ffects all Cassandra versions with incremental repair - like 2.1 and 3.0.

I think 2.1 should not be affected, as we started doing anti-compactions fo=
r full repairs in 2.2.

> Reappeared Data when Mixing Incremental and Full Repairs
> --------------------------------------------------------
>
>                 Key: CASSANDRA-13153
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1315=
3
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction, Tools
>         Environment: Apache Cassandra 2.2
>            Reporter: Amanda Debrot
>              Labels: Cassandra
>         Attachments: log-Reappeared-Data.txt, Step-by-Step-Simulate-Reapp=
eared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and SizeTieredCompactionS=
trategy.  I've only tested it on Cassandra version 2.2 but it most likely a=
lso affects all Cassandra versions with incremental repair - like 2.1 and 3=
.0.
> When mixing incremental and full repairs, there are a few scenarios where=
 the Data SSTable is marked as unrepaired and the Tombstone SSTable is mark=
ed as repaired.  Then if it is past gc_grace, and the tombstone and data ha=
s been compacted out on other replicas, the next incremental repair will pu=
sh the Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3D3
> Intial config:
> =09Node 1 has data and tombstone in separate SSTables.
> =09Node 2 has data and no tombstone.
> =09Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have =
tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merg=
ed to 1 SSTable on Nodes 1 and 3.
> =09Node 1 had a minor compaction that merged data with tombstone. 1 SSTab=
le with tombstone.
> =09Node 2 has data and tombstone in separate SSTables.
> =09Node 3 had a minor compaction that merged data with tombstone. 1 SSTab=
le with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr).=20
> Now there are 2 scenarios where the Data SSTable will get marked as "Unre=
paired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
>         Since the Data and Tombstone SSTable have been marked as "Repaire=
d" and anticompacted, they have had minor compactions with other SSTables c=
ontaining keys from other ranges.  During full repair, if the last node to =
run it doesn't own this particular key in it's partitioner range, the Data =
and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  N=
ow in the next incremental repair, if the Data SSTable is involved in a min=
or compaction during the repair but the Tombstone SSTable is not, the resul=
ting compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is=
 marked "Repaired".
> Scenario 2:
>         Only the Data SSTable had minor compaction with other SSTables co=
ntaining keys from other ranges after being marked as "Repaired".  The Tomb=
stone SSTable was never involved in a minor compaction so therefore all key=
s in that SSTable belong to 1 particular partitioner range. During full rep=
air, if the last node to run it doesn't own this particular key in it's par=
titioner range, the Data SSTable will get anticompacted and marked as "Unre=
paired".   The Tombstone SSTable stays marked as Repaired.
> Then it=E2=80=99s past gc_grace.  Since Node=E2=80=99s #1 and #3 only hav=
e 1 SSTable for that key, the tombstone will get compacted out.
> =09Node 1 has nothing.
> =09Node 2 has data (in unrepaired SSTable) and tombstone (in repaired SST=
able) in separate SSTables.
> =09Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTa=
ble to build the merkle tree since the tombstone SSTable is flagged as repa=
ired and data SSTable is marked as unrepaired.  And the data will get repai=
red against the other two nodes.
> =09Node 1 has data.
> =09Node 2 has data and tombstone in separate SSTables.
> =09Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 a=
nd 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)