Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Mon, 3 Feb 2014 14:20:13 +0000 (UTC)
From: "Marcus Eriksson (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12637095.1363298653424.27113.1391437213942@arcas>
In-Reply-To: <JIRA.12637095.1363298653424@arcas>
References: <JIRA.12637095.1363298653424@arcas>
Subject: [jira] [Comment Edited] (CASSANDRA-5351) Avoid repairing
 already-repaired data by default
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889494#comment-13889494 ] 

Marcus Eriksson edited comment on CASSANDRA-5351 at 2/3/14 2:18 PM:
--------------------------------------------------------------------

More complete version now pushed to  https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

Repair flow is now:
# Repair coordinator sends out Prepare messages to all neighbors
# All involved parties figure out what sstables should be included in the repair (if full repair, all sstables are included) otherwise only the ones with repairedAt set to 0. Note that we don't do any locking of the sstables, if they are gone when we do anticompaction it is fine - we will repair them next round.
# Repair coordinator prepares itself and waits until all neighbors have prepared and sends out TreeRequests.
# All nodes calculate merkle trees based on the sstables picked in step #2
# Coordinator waits for replies and then sends AnticompactionRequests to all nodes
# If we are doing full repair, we simply skip doing anticompaction.

notes;
* SSTables are tagged with repairedAt timestamps, compactions keep min(repairedAt) of the included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use the new repairs.
* anticompaction
  ** Split an sstable in 2 new ones. One sstable with all keys that were in the repaired ranges and one with unrepaired data.
  ** If the repaired ranges cover the entire sstable, we rewrite sstable metadata. This means that the optimal way to run incremental repairs is to not do partitioner range repairs etc.
* LCS
   ** We always first check if there are any unrepaired sstables to do STCS on, if there is, we do that. Reasoning being that new data (which needs compaction) is unrepaired.
   ** We keep all sstables in the LeveledManifest, then filter out the unrepaired ones when getting compaction candidates etc.
* STCS
  ** Major compaction is done by taking the biggest set of sstables - so for a total major compaction, you will need to run nodetool compact twice.
   ** Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair new sstable is not repaired.
* Upgradesstables - Keep repaired status


was (Author: krummas):
More complete version now pushed to  https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

Repair flow is now:
# Repair coordinator sends out Prepare messages to all neighbors
# All involved parties figure out what sstables should be included in the repair (if full repair, all sstables are included) otherwise only the ones with repairedAt set to 0. Note that we don't do any locking of the sstables, if they are gone when we do anticompaction it is fine - we will repair them next round.
# Repair coordinator prepares itself and waits until all neighbors have prepared and sends out TreeRequests.
# All nodes calculate merkle trees based on the sstables picked in step #2
# Coordinator waits for replies and then sends AnticompactionRequests to all nodes
# If we are doing full repair, we simply skip doing anticompaction.

notes;
* SSTables are tagged with repairedAt timestamps, compactions keep min(repairedAt) of the included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use the new repairs.
* anticompaction
  - Split an sstable in 2 new ones. One sstable with all keys that were in the repaired ranges and one with unrepaired data.
  - If the repaired ranges cover the entire sstable, we rewrite sstable metadata. This means that the optimal way to run incremental repairs is to not do partitioner range repairs etc.
* Compaction
  * LCS
    - We always first check if there are any unrepaired sstables to do STCS on, if there is, we do that. Reasoning being that new data (which needs compaction) is unrepaired.
    - We keep all sstables in the LeveledManifest, then filter out the unrepaired ones when getting compaction candidates etc.
  * STCS
    - Major compaction is done by taking the biggest set of sstables - so for a total major compaction, you will need to run nodetool compact twice.
    - Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair new sstable is not repaired.
* Upgradesstables - Keep repaired status


> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
>                 Key: CASSANDRA-5351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Lyuben Todorov
>              Labels: repair
>             Fix For: 2.1
>
>         Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log
>
>
> Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair.  (This automatically makes CASSANDRA-3362 much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired.  So we should segregate unrepaired sstables from the repaired ones.


--
This message was sent by Atlassian JIRA
(v6.1.5#6160)