Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 71884200BCC for ; Tue, 29 Nov 2016 14:15:01 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 70917160B05; Tue, 29 Nov 2016 13:15:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BBCD3160B15 for ; Tue, 29 Nov 2016 14:15:00 +0100 (CET) Received: (qmail 97271 invoked by uid 500); 29 Nov 2016 13:14:59 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 96991 invoked by uid 99); 29 Nov 2016 13:14:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2016 13:14:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id AA59F2C03E8 for ; Tue, 29 Nov 2016 13:14:59 +0000 (UTC) Date: Tue, 29 Nov 2016 13:14:59 +0000 (UTC) From: "Marcus Eriksson (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 29 Nov 2016 13:15:01 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705274#comment-15705274 ] Marcus Eriksson commented on CASSANDRA-9143: -------------------------------------------- Looks good in general - comments; * Rename the cleanup compaction task, very confusing wrt the current cleanup compactions * Should we prioritize the pending-repair-cleanup compactions? ** If we don't we might compare different datasets - a repair fails half way through and one node happens to move the pending data to unrepaired, operator retriggers repair and we would compare different datasets. If we instead move the data back as quickly as possible we minimize this window ** It would also help the next normal compactions as we might be able to include more sstables in the repaired/unrepaired strategies * Is there any point in doing anticompaction after repair with -full repairs? Can we always do consistent repairs? We would need to anticompact already repaired sstables into pending, but that should not be a big problem? * In CompactionManager#getSSTablesToValidate we still mark all unrepaired sstables as repairing - we don't need to do that for consistent repairs. And if we can do consistent repair for -full as well, all that code can be removed * In handleStatusRequest - if we don't have the local session, we should probably return that the session is failed? * Fixed some minor nits here: https://github.com/krummas/cassandra/commit/24ef8b2f6df98431d66519ee12452df3db84fd7d > Improving consistency of repairAt field across replicas > -------------------------------------------------------- > > Key: CASSANDRA-9143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9143 > Project: Cassandra > Issue Type: Improvement > Reporter: sankalp kohli > Assignee: Blake Eggleston > > We currently send an anticompaction request to all replicas. During this, a node will split stables and mark the appropriate ones repaired. > The problem is that this could fail on some replicas due to many reasons leading to problems in the next repair. > This is what I am suggesting to improve it. > 1) Send anticompaction request to all replicas. This can be done at session level. > 2) During anticompaction, stables are split but not marked repaired. > 3) When we get positive ack from all replicas, coordinator will send another message called markRepaired. > 4) On getting this message, replicas will mark the appropriate stables as repaired. > This will reduce the window of failure. We can also think of "hinting" markRepaired message if required. > Also the stables which are streaming can be marked as repaired like it is done now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)