cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuki Morishita (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9097) Repeated incremental nodetool repair results in failed repairs due to running anticompaction
Date Fri, 03 Apr 2015 02:54:54 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393989#comment-14393989
] 

Yuki Morishita commented on CASSANDRA-9097:
-------------------------------------------

The problem is that the repair coordinator does not wait anticompaction to finish on other
nodes.
We can change the behavior to wait until coordinator receives notification from other replica,
but doing so can be a problem between the nodes in different minor version.

We definitely need to fix this in 3.0, though let me see what would be the right solution
for 2.1.x.

> Repeated incremental nodetool repair results in failed repairs due to running anticompaction
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9097
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9097
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Gustav Munkby
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 2.1.5
>
>
> I'm trying to synchronize incremental repairs over multiple nodes in a Cassandra cluster,
and it does not seem to easily achievable.
> In principle, the process iterates through the nodes of the cluster and performs `nodetool
-h $NODE repair --incremental`, but that sometimes fails on subsequent nodes. The reason for
failing seems to be that the repair returns as soon as the repair and the _local_ anticompaction
has completed, but does not guarantee that remote anticompactions are complete. If I subsequently
try to issue another repair command, they fail to start (and terminate with failure after
about one minute). It usually isn't a problem, as the local anticompaction typically involves
as much (or more) data as the remote ones, but sometimes not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message