cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3486) Node Tool command to stop repair
Date Tue, 26 Apr 2016 21:56:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259008#comment-15259008
] 

Paulo Motta commented on CASSANDRA-3486:
----------------------------------------

Thanks for the feedback [~nickmbailey]. See follow-up below:

bq. If the abort is initiated on the coordinator can we return the success/failure of the
attempt to abort on the participants as well? And vice versa? Similarly for the list of results
when aborting all jobs.

We could, but in this initial implementation I opted to take an optimistic approach to keep
the protocol simple and non-blocking. If for some reason there is a network partition and
"orphaned" sessions keep running, you can always abort them individually later. Do you think
a blocking + timeout approach would be preferable?

bq. Can we make sure we are testing the case where for whatever reason a coordinator or participant
receives an abort for a repair it doesn't know about?

Sure. One of the changes of this patch that I forgot to mention is that all messages are validated
against the repair session UUID, so if a node receives a message from a repair it doesn't
know about it logs and ignores it.

bq. Since we are now tracking repairs by uuid like this, can we expose a progress API outside
of the jmx notification process? An mbean for retrieving the progress/status of a repair job
by uuid?

We could, but we currently don't keep state or progress information in the repair session.
Furthermore we clear repair session information as soon as it's finished, so the list repairs
stub only list currently active repairs. So we would need to maintain progress status and
provide some way to clear repair information after some time. 

I personally think we should go this route of making repair more stateful, what will not only
improve monitoring but will also allow us to break up a repair job into more decoupled subtasks,
simplifying the single chain of futures we have today, which can be quite complex to understand
and error-prone.

> Node Tool command to stop repair
> --------------------------------
>
>                 Key: CASSANDRA-3486
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3486
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>         Environment: JVM
>            Reporter: Vijay
>            Assignee: Paulo Motta
>            Priority: Minor
>              Labels: repair
>             Fix For: 2.1.x
>
>         Attachments: 0001-stop-repair-3583.patch
>
>
> After CASSANDRA-1740, If the validation compaction is stopped then the repair will hang.
This ticket will allow users to kill the original repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message