cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Olsson (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
Date Mon, 07 Dec 2015 09:29:11 GMT


Marcus Olsson commented on CASSANDRA-10070:

[~zemeyer] I've added the possibility to schedule a job remotely, so that one node can tell
another node to run a certain job. Right now it's used for when a node discovers that another
node has been down longer than the possible hint window, and then tells that node to repair
it's ranges ASAP. The remote scheduling is using the distributed locking mechanism to avoid
that multiple nodes try to tell the same node to run the repair at the same time.

So a simple flow could be:
Node A goes down at 12:00
Node B recognizes it and saves "Node A DOWN @ 12:00" locally
Node A comes back up at 16:00
Node B sees Node A as online again at 16:00 and sees that Node A has been down since 12:00,
4 hours.
Node B sends a repair job to Node A for each table that has a hint window that is 4 hours
or less.
Node A runs all repairs


I'll continue to work on the feature of pausing all repairs and also the prevention mechanism.
I've done some work for the prevention mechanism for jobs in that it checks the job history
for repairs and only returns that it *can* run a repair if any range hasn't been repaired
within the hint window (it's still based on the interval though, so the repair shouldn't run
more than once per interval in the normal case).

To the prevention mechanism I should probably add a way for it to avoid doing multiple repairs
for a single node at the same time. After that I'll add the possibility to run parallel repair
tasks over the cluster.


The git branch is [here|].

> Automatic repair scheduling
> ---------------------------
>                 Key: CASSANDRA-10070
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Olsson
>            Assignee: Marcus Olsson
>            Priority: Minor
>             Fix For: 3.x
> Scheduling and running repairs in a Cassandra cluster is most often a required task,
but this can both be hard for new users and it also requires a bit of manual configuration.
There are good tools out there that can be used to simplify things, but wouldn't this be a
good feature to have inside of Cassandra? To automatically schedule and run repairs, so that
when you start up your cluster it basically maintains itself in terms of normal anti-entropy,
with the possibility for manual configuration.

This message was sent by Atlassian JIRA

View raw message