cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10446) Run repair with down replicas
Date Mon, 14 Nov 2016 13:46:58 GMT


Paulo Motta commented on CASSANDRA-10446:

bq. Which means we obviously cannot mark anything "repaired" if some node was down. This seems
to be what the last patch is doing, but some of the discussions above seems to suggest this
could be done differently in the future, after CASSANDRA-9143 in particular.  Did I misread
those discussions or did I miss something more fundamental?

When you trigger a repair command (parent repair session) it will trigger many (child) repair
sessions, typically one for each vnode subrange. In the end of the parent repair session,
it will anti-compact only the ranges of successful child repair sessions, since a subset of
the child repair sessions may have failed due to node failures or whatever, and so their ranges
cannot be marked as repaired. Likewise, when you trigger a repair {{--force}}, only a subset
of the child repair sessions may have down nodes, so we can still mark ranges of successful
child repair sessions as repaired (the ones where all nodes were up), and this is what the
patch is currently doing and will be kept after CASSANDRA-9143.

What was brought here and might have confused things a bit is that in both cases (with and
without {{--force}}), streamed sstables are always marked as repaired, what may cause problems
in some edge failure scenarios (if a repair session fails after part of the syncs are completed),
and this limitation in particular will be addressed on CASSANDRA-9143.

Does this clarify your concerns or is there something else we may be missing?

> Run repair with down replicas
> -----------------------------
>                 Key: CASSANDRA-10446
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Blake Eggleston
>            Priority: Minor
>             Fix For: 4.0
> We should have an option of running repair when replicas are down. We can call it -force.

This message was sent by Atlassian JIRA

View raw message