cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12901) Repair may hang if node dies during sync
Date Tue, 15 Nov 2016 18:49:58 GMT


Paulo Motta commented on CASSANDRA-12901:

The dtest is testing two scenarios:
1 - failed replica syncing to other participant
2 - failed replica syncing from coordinator

But there was an error in the original dtest which was making it test only case 1. When I
fixed case 2, the repair session is failing due to streaming breaking, but streaming fails
befores the FD detects the node is down, so an anti-compaction request is being sent to the
failed replica, which never replies, making repair hang again. So, this just uncovered another
bug which is that if a node fails in the middle of anti-compaction repair will also hang.
I will also address this in this same ticket, but will keep it as PA to get initial feedback.

> Repair may hang if node dies during sync
> ----------------------------------------
>                 Key: CASSANDRA-12901
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
> Since the repair coordinator unregisters from the FD after validation (CASSANDRA-3569),
if the initiator of a RemoteSyncTask fails, the coordinator will never know the sync task
failed and hang.

This message was sent by Atlassian JIRA

View raw message