cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12901) Repair may hang if node dies during sync
Date Tue, 15 Nov 2016 18:49:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667927#comment-15667927
] 

Paulo Motta commented on CASSANDRA-12901:
-----------------------------------------

The dtest is testing two scenarios:
1 - failed replica syncing to other participant
2 - failed replica syncing from coordinator

But there was an error in the original dtest which was making it test only case 1. When I
fixed case 2, the repair session is failing due to streaming breaking, but streaming fails
befores the FD detects the node is down, so an anti-compaction request is being sent to the
failed replica, which never replies, making repair hang again. So, this just uncovered another
bug which is that if a node fails in the middle of anti-compaction repair will also hang.
I will also address this in this same ticket, but will keep it as PA to get initial feedback.

> Repair may hang if node dies during sync
> ----------------------------------------
>
>                 Key: CASSANDRA-12901
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12901
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>
> Since the repair coordinator unregisters from the FD after validation (CASSANDRA-3569),
if the initiator of a RemoteSyncTask fails, the coordinator will never know the sync task
failed and hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message