cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
Date Thu, 04 Feb 2016 22:55:40 GMT


Paulo Motta commented on CASSANDRA-10070:

Nice work [~molsson]. Overall the design doc looks great and addresses most of the issues
raised previously, just a few minor comments/questions:
* I second [~yukim]'s first question above, in that we need to better specify how is cluster-wide
repair parallelism handled: is it fixed or configurable? can a node run repair for multiple
ranges in parallel? Perhaps we should have a  {{node_repair_paralellism}} (default 1) and
{{dc_repair_parallelism}} (default 1) global config and reject starting repairs above those
* For subrange repair, we could maybe have something similar to [reaper|]'s
{{segmentCount}} option, but since this would add more complexity we could leave for a separate
* While pausing repair is a nice future for user-based interruptions, we could probably embed
system known interruptions (such as when a bootstrap or upgrade is going on) in the default
rejection logic.

Maybe the spotify reaper folks have something to add based on their experience with automatic
repair scheduling (cc [~Bj0rn], [~zvo]).

> Automatic repair scheduling
> ---------------------------
>                 Key: CASSANDRA-10070
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Olsson
>            Assignee: Marcus Olsson
>            Priority: Minor
>             Fix For: 3.x
>         Attachments: Distributed Repair Scheduling.doc
> Scheduling and running repairs in a Cassandra cluster is most often a required task,
but this can both be hard for new users and it also requires a bit of manual configuration.
There are good tools out there that can be used to simplify things, but wouldn't this be a
good feature to have inside of Cassandra? To automatically schedule and run repairs, so that
when you start up your cluster it basically maintains itself in terms of normal anti-entropy,
with the possibility for manual configuration.

This message was sent by Atlassian JIRA

View raw message