Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 29F1110A8C for ; Mon, 3 Feb 2014 14:20:15 +0000 (UTC) Received: (qmail 45176 invoked by uid 500); 3 Feb 2014 14:20:14 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 45153 invoked by uid 500); 3 Feb 2014 14:20:14 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 45139 invoked by uid 99); 3 Feb 2014 14:20:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Feb 2014 14:20:13 +0000 Date: Mon, 3 Feb 2014 14:20:13 +0000 (UTC) From: "Marcus Eriksson (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-5351) Avoid repairing already-repaired data by default MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889494#comment-13889494 ] Marcus Eriksson edited comment on CASSANDRA-5351 at 2/3/14 2:18 PM: -------------------------------------------------------------------- More complete version now pushed to https://github.com/krummas/cassandra/tree/marcuse/5351 Lots of testing required, but i think it is mostly 'feature-complete'; Repair flow is now: # Repair coordinator sends out Prepare messages to all neighbors # All involved parties figure out what sstables should be included in the repair (if full repair, all sstables are included) otherwise only the ones with repairedAt set to 0. Note that we don't do any locking of the sstables, if they are gone when we do anticompaction it is fine - we will repair them next round. # Repair coordinator prepares itself and waits until all neighbors have prepared and sends out TreeRequests. # All nodes calculate merkle trees based on the sstables picked in step #2 # Coordinator waits for replies and then sends AnticompactionRequests to all nodes # If we are doing full repair, we simply skip doing anticompaction. notes; * SSTables are tagged with repairedAt timestamps, compactions keep min(repairedAt) of the included sstables. * nodetool repair defaults to use the old behaviour. Use --incremental to use the new repairs. * anticompaction ** Split an sstable in 2 new ones. One sstable with all keys that were in the repaired ranges and one with unrepaired data. ** If the repaired ranges cover the entire sstable, we rewrite sstable metadata. This means that the optimal way to run incremental repairs is to not do partitioner range repairs etc. * LCS ** We always first check if there are any unrepaired sstables to do STCS on, if there is, we do that. Reasoning being that new data (which needs compaction) is unrepaired. ** We keep all sstables in the LeveledManifest, then filter out the unrepaired ones when getting compaction candidates etc. * STCS ** Major compaction is done by taking the biggest set of sstables - so for a total major compaction, you will need to run nodetool compact twice. ** Minors works the same way, the biggest set of sstables will be compacted. * Streaming - A streamed SSTable keeps its repairedAt time. * BulkLoader - Loaded sstables are unrepaired. * Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair new sstable is not repaired. * Upgradesstables - Keep repaired status was (Author: krummas): More complete version now pushed to https://github.com/krummas/cassandra/tree/marcuse/5351 Lots of testing required, but i think it is mostly 'feature-complete'; Repair flow is now: # Repair coordinator sends out Prepare messages to all neighbors # All involved parties figure out what sstables should be included in the repair (if full repair, all sstables are included) otherwise only the ones with repairedAt set to 0. Note that we don't do any locking of the sstables, if they are gone when we do anticompaction it is fine - we will repair them next round. # Repair coordinator prepares itself and waits until all neighbors have prepared and sends out TreeRequests. # All nodes calculate merkle trees based on the sstables picked in step #2 # Coordinator waits for replies and then sends AnticompactionRequests to all nodes # If we are doing full repair, we simply skip doing anticompaction. notes; * SSTables are tagged with repairedAt timestamps, compactions keep min(repairedAt) of the included sstables. * nodetool repair defaults to use the old behaviour. Use --incremental to use the new repairs. * anticompaction - Split an sstable in 2 new ones. One sstable with all keys that were in the repaired ranges and one with unrepaired data. - If the repaired ranges cover the entire sstable, we rewrite sstable metadata. This means that the optimal way to run incremental repairs is to not do partitioner range repairs etc. * Compaction * LCS - We always first check if there are any unrepaired sstables to do STCS on, if there is, we do that. Reasoning being that new data (which needs compaction) is unrepaired. - We keep all sstables in the LeveledManifest, then filter out the unrepaired ones when getting compaction candidates etc. * STCS - Major compaction is done by taking the biggest set of sstables - so for a total major compaction, you will need to run nodetool compact twice. - Minors works the same way, the biggest set of sstables will be compacted. * Streaming - A streamed SSTable keeps its repairedAt time. * BulkLoader - Loaded sstables are unrepaired. * Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair new sstable is not repaired. * Upgradesstables - Keep repaired status > Avoid repairing already-repaired data by default > ------------------------------------------------ > > Key: CASSANDRA-5351 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 > Project: Cassandra > Issue Type: Task > Components: Core > Reporter: Jonathan Ellis > Assignee: Lyuben Todorov > Labels: repair > Fix For: 2.1 > > Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log > > > Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient. > We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair. (This automatically makes CASSANDRA-3362 much less of a problem too.) > The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired. So we should segregate unrepaired sstables from the repaired ones. -- This message was sent by Atlassian JIRA (v6.1.5#6160)