cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuki Morishita (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12901) Repair may hang if node dies during sync
Date Tue, 15 Nov 2016 22:14:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668558#comment-15668558
] 

Yuki Morishita commented on CASSANDRA-12901:
--------------------------------------------

You are right that when the remote streaming node (the node that receives SyncRequest message)
dies, coordinator is never notified for the failure and repair hangs. I'd rather make it not
hang so bringing back FD would be fine regarding the false positive it brings.

bq. but streaming fails befores the FD detects the node is down, so an anti-compaction request
is being sent to the failed replica

Hmm, yeah looks like this can happen. Looks like we need to mark failed node and eliminate
from anti-compacting nodes rather than relying on FD alive check in {{AntiCompactionTask}}.


> Repair may hang if node dies during sync
> ----------------------------------------
>
>                 Key: CASSANDRA-12901
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12901
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>
> Since the repair coordinator unregisters from the FD after validation (CASSANDRA-3569),
if the initiator of a RemoteSyncTask fails, the coordinator will never know the sync task
failed and hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message