Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Fri, 9 Aug 2013 18:20:48 +0000 (UTC)
From: "Jonathan Ellis (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12637095.1363298653424.34627.1376072448749@arcas>
In-Reply-To: <JIRA.12637095.1363298653424@arcas>
References: <JIRA.12637095.1363298653424@arcas>
Subject: [jira] [Commented] (CASSANDRA-5351) Avoid repairing
 already-repaired data by default
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735099#comment-13735099 ] 

Jonathan Ellis commented on CASSANDRA-5351:
-------------------------------------------

Since we have size-tiering-in-L0 in 2.0, maybe we could leverage that to make this sane with LCS: Levels 1+ are only for already-repaired data, unrepaired data hangs out in L0 until we can repair.

The question is, is this unacceptable if we lose a node for a few days (and thus can't repair and L0 gets increasingly large)?  WDYT [~tjake]?
                
> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
>                 Key: CASSANDRA-5351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Jonathan Ellis
>              Labels: repair
>             Fix For: 2.1
>
>
> Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair.  (This automatically makes CASSANDRA-3362 much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired.  So we should segregate unrepaired sstables from the repaired ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira