Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 47A5F18640 for ; Fri, 15 May 2015 00:13:02 +0000 (UTC) Received: (qmail 51749 invoked by uid 500); 15 May 2015 00:13:02 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 51714 invoked by uid 500); 15 May 2015 00:13:02 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 51702 invoked by uid 99); 15 May 2015 00:13:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 May 2015 00:13:02 +0000 Date: Fri, 15 May 2015 00:13:01 +0000 (UTC) From: "Yuki Morishita (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-9097) Repeated incremental nodetool repair results in failed repairs due to running anticompaction MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita updated CASSANDRA-9097: -------------------------------------- Attachment: 0001-Remove-parent-session-on-remotes-when-repair-fails.patch When repair session fails, we are only removing coordinator's parent repair session. Currently, parent repair session is only removed when exception is thrown from ANTIENTROPY_STAGE, but validation and streaming happen on separate threads so we have to clean them separately. I introduced new CleanupMessage and only send it to the nodes that pass version check. So adding new message should be fine. Note that this is not be an issue for 2.2+, since we are sending succeeded repair ranges, though we need to add new message to trunk for compatibility. I will (try to) write dtest to cover this scenario, though I submit patch first for the review. > Repeated incremental nodetool repair results in failed repairs due to running anticompaction > -------------------------------------------------------------------------------------------- > > Key: CASSANDRA-9097 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9097 > Project: Cassandra > Issue Type: Bug > Reporter: Gustav Munkby > Assignee: Yuki Morishita > Priority: Minor > Fix For: 2.2 beta 1, 2.1.6 > > Attachments: 0001-Remove-parent-session-on-remotes-when-repair-fails.patch, 0001-Wait-for-anticompaction-to-finish.patch > > > I'm trying to synchronize incremental repairs over multiple nodes in a Cassandra cluster, and it does not seem to easily achievable. > In principle, the process iterates through the nodes of the cluster and performs `nodetool -h $NODE repair --incremental`, but that sometimes fails on subsequent nodes. The reason for failing seems to be that the repair returns as soon as the repair and the _local_ anticompaction has completed, but does not guarantee that remote anticompactions are complete. If I subsequently try to issue another repair command, they fail to start (and terminate with failure after about one minute). It usually isn't a problem, as the local anticompaction typically involves as much (or more) data as the remote ones, but sometimes not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)