Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Fri, 15 May 2015 00:13:01 +0000 (UTC)
From: "Yuki Morishita (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12787647.1427971881000.121784.1431648781970@Atlassian.JIRA>
In-Reply-To: <JIRA.12787647.1427971881000@Atlassian.JIRA>
References: <JIRA.12787647.1427971881000@Atlassian.JIRA>
 <JIRA.12787647.1427971881127@arcas>
Subject: [jira] [Updated] (CASSANDRA-9097) Repeated incremental nodetool
 repair results in failed repairs due to running anticompaction
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/CASSANDRA-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-9097:
--------------------------------------
    Attachment: 0001-Remove-parent-session-on-remotes-when-repair-fails.patch

When repair session fails, we are only removing coordinator's parent repair session.
Currently, parent repair session is only removed when exception is thrown from ANTIENTROPY_STAGE, but validation and streaming happen on separate threads so we have to clean them separately.

I introduced new CleanupMessage and only send it to the nodes that pass version check. So adding new message should be fine.

Note that this is not be an issue for 2.2+, since we are sending succeeded repair ranges, though we need to add new message to trunk for compatibility.

I will (try to) write dtest to cover this scenario, though I submit patch first for the review.

> Repeated incremental nodetool repair results in failed repairs due to running anticompaction
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9097
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9097
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Gustav Munkby
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 2.2 beta 1, 2.1.6
>
>         Attachments: 0001-Remove-parent-session-on-remotes-when-repair-fails.patch, 0001-Wait-for-anticompaction-to-finish.patch
>
>
> I'm trying to synchronize incremental repairs over multiple nodes in a Cassandra cluster, and it does not seem to easily achievable.
> In principle, the process iterates through the nodes of the cluster and performs `nodetool -h $NODE repair --incremental`, but that sometimes fails on subsequent nodes. The reason for failing seems to be that the repair returns as soon as the repair and the _local_ anticompaction has completed, but does not guarantee that remote anticompactions are complete. If I subsequently try to issue another repair command, they fail to start (and terminate with failure after about one minute). It usually isn't a problem, as the local anticompaction typically involves as much (or more) data as the remote ones, but sometimes not.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)