cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-11848) replace address can "succeed" without actually streaming anything
Date Mon, 23 May 2016 23:47:12 GMT


Paulo Motta commented on CASSANDRA-11848:

Reproduced this with a [simple replace_address dtest|].
Also [added bootstrap dtests|]
to verify that bootstrap fails if any replica is down when {{cassandra.consistent.rangemovement=true}}
or if more than RF replicas are down and {{cassandra.consistent.rangemovement=false}}.

What happens is that {{replace_address}} node does not consider itself a pending endpoint,
but instead replaces the old node with itself on {{TokenMetadata}}, so it considers itself
a valid source on {{RangeStreamer.getRangeFetchMap}}, even though it only stream from other
replicas. In practice, this means the replacing node only stream from alive replicas and silently
ignore down replicas (even if all other replicas are down).

Considering the local a node a valid source was added on CASSANDRA-4200 since it's a valid
scenario during single-node moves. While CASSANDRA-8523 should fix this by making replace
go through the normal bootstrap path, the simple fix for now is to not consider the local
node a valid source during bootstraps/replaces. This does not affect CASSANDRA-4200 dtest

Patch and tests below:

For some reason I'm not able to submit tests to cassCI. I will try again later and report
back here when tests are available.

> replace address can "succeed" without actually streaming anything
> -----------------------------------------------------------------
>                 Key: CASSANDRA-11848
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Jeremiah Jordan
>            Assignee: Paulo Motta
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
> When you do a replace address and the new node has the same IP as the node it is replacing,
then the following check can let the replace be successful even if we think all the other
nodes are down:
> As the FailureDetectorSourceFilter will exclude the other nodes, so an empty stream plan
gets executed.

This message was sent by Atlassian JIRA

View raw message