cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12653) In-flight shadow round requests
Date Wed, 22 Feb 2017 15:22:44 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878461#comment-15878461
] 

Joel Knighton commented on CASSANDRA-12653:
-------------------------------------------

I think I can answer these - feel free to correct me, [~spodxx@gmail.com]

In order,
* Presently, the tests depend on the mock MessagingService, which was added in [CASSANDRA-12016]
to 3.10+. We'd new tests for 2.2/3.0+, which is desirable, but I have no great ideas how to
do it other than fiddly byteman tests. 
* I agree with this. Stefan and I discussed it on the first pass of review, and I wouldn't
mind eliminating that check altogether and making it a boolean. OTOH, it's cheap to check
deserialization time and excludes the messages that were deserialized prior to the check.
OTOH, there's no meaningful distinction in correctness-preserving behaviors between that and
arbitrarily delayed gossip messages, and we need to handle the latter correctly anyway. I'm
most concerned about this check giving future readers false hope :).
* It also seems to be me that it doesn't presently need to be synchronized. That said, I assumed
it was a defensive choice because the internals are definitely not safe to call on multiple
threads, and someone may make that mistake in the future.

> In-flight shadow round requests
> -------------------------------
>
>                 Key: CASSANDRA-12653
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12653
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>            Reporter: Stefan Podkowinski
>            Assignee: Stefan Podkowinski
>            Priority: Minor
>             Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> Bootstrapping or replacing a node in the cluster requires to gather and check some host
IDs or tokens by doing a gossip "shadow round" once before joining the cluster. This is done
by sending a gossip SYN to all seeds until we receive a response with the cluster state, from
where we can move on in the bootstrap process. Receiving a response will call the shadow round
done and calls {{Gossiper.resetEndpointStateMap}} for cleaning up the received state again.
> The issue here is that at this point there might be other in-flight requests and it's
very likely that shadow round responses from other seeds will be received afterwards, while
the current state of the bootstrap process doesn't expect this to happen (e.g. gossiper may
or may not be enabled). 
> One side effect will be that MigrationTasks are spawned for each shadow round reply except
the first. Tasks might or might not execute based on whether at execution time {{Gossiper.resetEndpointStateMap}}
had been called, which effects the outcome of {{FailureDetector.instance.isAlive(endpoint))}}
at start of the task. You'll see error log messages such as follows when this happend:
> {noformat}
> INFO  [SharedPool-Worker-1] 2016-09-08 08:36:39,255 Gossiper.java:993 - InetAddress /xx.xx.xx.xx
is now UP
> ERROR [MigrationStage:1]    2016-09-08 08:36:39,255 FailureDetector.java:223 - unknown
endpoint /xx.xx.xx.xx
> {noformat}
> Although is isn't pretty, I currently don't see any serious harm from this, but it would
be good to get a second opinion (feel free to close as "wont fix").
> /cc [~Stefania] [~thobbs]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message