Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 465EA10EEB for ; Tue, 4 Feb 2014 06:20:22 +0000 (UTC) Received: (qmail 13318 invoked by uid 500); 4 Feb 2014 06:20:21 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 13067 invoked by uid 500); 4 Feb 2014 06:20:18 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 13014 invoked by uid 99); 4 Feb 2014 06:20:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Feb 2014 06:20:16 +0000 Date: Tue, 4 Feb 2014 06:20:16 +0000 (UTC) From: "Yuki Morishita (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890438#comment-13890438 ] Yuki Morishita commented on CASSANDRA-5351: ------------------------------------------- bq. Dropping sstable to UNREPAIRED during major compaction means that all repaired data status is cleared for the node. That's what I meant. Current major compaction produces one SSTable and I think changing that behavior would confuse users, maybe. My opinion is to keep it as is, but . Additional review comments: * Does PrepareMessage needs to carry around dataCenters? Only coordinator sends out messages so I think you can drop it(also from ParentRepairSession). * CF ID is preferred to use over Keyspace name/CF name pair. * PrepareMessage is sent per CF but it can produce a lot of round trip. Isn't one message per replica node enough? * I think we need clean up for parentRepairSessions when something bad happened. Otherwise ParentRepairSession in the map keep reference to SSTables. I just worked on the first one above and the commit is here(on top of your branch): https://github.com/yukim/cassandra/commit/7c65e532dd69f9f4c1ea2d3fdf0401ed70291361 > Avoid repairing already-repaired data by default > ------------------------------------------------ > > Key: CASSANDRA-5351 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 > Project: Cassandra > Issue Type: Task > Components: Core > Reporter: Jonathan Ellis > Assignee: Lyuben Todorov > Labels: repair > Fix For: 2.1 > > Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log > > > Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient. > We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair. (This automatically makes CASSANDRA-3362 much less of a problem too.) > The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired. So we should segregate unrepaired sstables from the repaired ones. -- This message was sent by Atlassian JIRA (v6.1.5#6160)