cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Olsson (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
Date Thu, 29 Oct 2015 09:04:28 GMT


Marcus Olsson commented on CASSANDRA-10070:

Just to clarify, the automatic scheduling is done on a node level. The way it distributes
is by "competing" with the other nodes with regards to who has the highest need for a repair
and then uses a CAS lock to obtain the right to run a repair. So the repair process would
continue during upgrade, but I assume it would fail as it is right now and that the repair
job would be retried. The problem here is that this job would try to run until it succeeded
since it has the highest priority, even if there are other repair jobs that could run (e.g.
if only a part of the cluster was upgraded).

To allow repairs during an upgrade scenario I think we need to have both CASSANDRA-7530 &
CASSANDRA-8110 in place.
Until then I see two options:
* Make it possible to "pause" all repair scheduling, e.g. during upgrade scenarios.
* Make the repair job recognize that it cannot run at this time and allow another repair job
to run instead.

I wouldn't mind implementing both options, since there might be scenarios when both are needed,
even if we can repair between versions.

> Automatic repair scheduling
> ---------------------------
>                 Key: CASSANDRA-10070
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Olsson
>            Assignee: Marcus Olsson
>            Priority: Minor
>             Fix For: 3.x
> Scheduling and running repairs in a Cassandra cluster is most often a required task,
but this can both be hard for new users and it also requires a bit of manual configuration.
There are good tools out there that can be used to simplify things, but wouldn't this be a
good feature to have inside of Cassandra? To automatically schedule and run repairs, so that
when you start up your cluster it basically maintains itself in terms of normal anti-entropy,
with the possibility for manual configuration.

This message was sent by Atlassian JIRA

View raw message