Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F21A10944 for ; Fri, 9 Aug 2013 18:20:49 +0000 (UTC) Received: (qmail 35134 invoked by uid 500); 9 Aug 2013 18:20:49 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 35108 invoked by uid 500); 9 Aug 2013 18:20:49 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 35084 invoked by uid 99); 9 Aug 2013 18:20:48 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Aug 2013 18:20:48 +0000 Date: Fri, 9 Aug 2013 18:20:48 +0000 (UTC) From: "Jonathan Ellis (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735099#comment-13735099 ] Jonathan Ellis commented on CASSANDRA-5351: ------------------------------------------- Since we have size-tiering-in-L0 in 2.0, maybe we could leverage that to make this sane with LCS: Levels 1+ are only for already-repaired data, unrepaired data hangs out in L0 until we can repair. The question is, is this unacceptable if we lose a node for a few days (and thus can't repair and L0 gets increasingly large)? WDYT [~tjake]? > Avoid repairing already-repaired data by default > ------------------------------------------------ > > Key: CASSANDRA-5351 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 > Project: Cassandra > Issue Type: Task > Components: Core > Reporter: Jonathan Ellis > Labels: repair > Fix For: 2.1 > > > Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient. > We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair. (This automatically makes CASSANDRA-3362 much less of a problem too.) > The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired. So we should segregate unrepaired sstables from the repaired ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira