cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Olsson (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
Date Mon, 15 Feb 2016 16:51:18 GMT


Marcus Olsson commented on CASSANDRA-10070:

All data centers involved in a repair must be available for a repair to start/succeed, so
if we make the lock resource dc-aware and try to create the lock by contacting a node in each
involved data center with LOCAL_SERIAL consistency that should be sufficient to ensure correctness
without the need for a global lock. This will also play along well with both dc_parallelism
global option and with the --local or --dcs table repair options.

The second alternative is probably the most desireable. Actually dc_parallelism by itself
might cause problems, since we can have a situation where all repairs run in a single node
or range, overloading those nodes. If we are to support concurrent repairs in the first pass,
I think we need both dc_parallelism and node_parallelism options together.

This is becoming a bit complex and there probably are some edge cases and/or starvation scenarios
so we should think carefully about before jumping into implementation. What do you think about
this approach? Should we stick to a simpler non-parallel version in the first pass or think
this through and already support parallelism in the first version?

I like the approach with using local serial for each dc and having specialized keys. I think
we could include the dc parallelism lock with "RepairResource-\{dc}-\{i}" but only allow one
repair per data center by hardcoding "i" to 1 in the first pass. This should make the upgrades
easier when we do allow parallel repairs. I like the node locks approach as well, but as you
say there are probably some edge cases so we could wait with adding them until we allow parallel
repairs and I don't think it would break the upgrades by introducing them later.

We should also think better about possible failure scenarios and network partitions. What
happens if the node cannot renew locks in a remote DC due to a temporary network partition
but the repair is still running ? We should probably cancel a repair if not able to renew
the lock and also have some kind of garbage collector to kill ongoing repair sessions without
associated locks to protect from disrespecting the configured dc_parallelism and node_paralellism.
I agree and we could probably store the parent repair session id in an extra column of the
lock table and have a thread wake up periodically to see if there are repair sessions without
locks. But then we must somehow be able to differentiate user-defined and automatically scheduled
repair sessions. It could be done by having all repairs go through this scheduling interface,
which also would reduce user mistakes with multiple repairs in parallel. Another alternative
is to have a custom flag in the parent repair that makes the garbage collector ignore it if
it's user-defined. I think that the garbage collector/cancel repairs when unable to lock feature
is something that should be included in the first pass.

The most basic failure scenarios should be covered by retrying a repair if it fails and log
a warning/error based on how many times it failed. Could the retry behaviour cause some unexpected

> Automatic repair scheduling
> ---------------------------
>                 Key: CASSANDRA-10070
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Olsson
>            Assignee: Marcus Olsson
>            Priority: Minor
>             Fix For: 3.x
>         Attachments: Distributed Repair Scheduling.doc
> Scheduling and running repairs in a Cassandra cluster is most often a required task,
but this can both be hard for new users and it also requires a bit of manual configuration.
There are good tools out there that can be used to simplify things, but wouldn't this be a
good feature to have inside of Cassandra? To automatically schedule and run repairs, so that
when you start up your cluster it basically maintains itself in terms of normal anti-entropy,
with the possibility for manual configuration.

This message was sent by Atlassian JIRA

View raw message