cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-7816) Duplicate DOWN/UP Events Pushed with Native Protocol
Date Wed, 04 Mar 2015 04:39:04 GMT


Stefania updated CASSANDRA-7816:
    Attachment: cassandra_7816.txt

Submitting a patch for 2.0.

The duplicate DOWN notification is caused by {{Gossiper.handleMajorStateChange}} passing the
remote endpoint state to {{StorageService.onRestart}}, which then incorrectly comes to the
conclusion that the node was not previously marked down. I changed it to receive the local
state, if not null. If it is null we do not call {{onRestart}}, please confirm that this does
not introduce problems (I checked all {{onStart}} implementations and it looks OK to me).

The multiple UP notifications are caused by the call to {{markAlive()}} in {{Gossiper.applyStateLocally()}}
when receiving multiple gossip messages. Because {{markAlive()}} only marks the node as alive
after receiving an echo message (CASSANDRA-3533), there is a delay during which the node is
still not marked as alive. If gossip messages are received during this period, we incorrectly
call {{markAlive()}} multiple times in {{applyStateLocally()}}. I fixed it by adding a flag
to {{EndpointState}} and by checking this flag in {{markAlive}}, if an echo is outstanding
then we do not send another one until we've received an answer. When there is a major change,
{{markAlive()}} is called on the remote state, for which this flag is not set and so we try
againg sending an echo message in mark alive even if we did not receive a reply to a previous
echo request.

> Duplicate DOWN/UP Events Pushed with Native Protocol
> ----------------------------------------------------
>                 Key: CASSANDRA-7816
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>            Reporter: Michael Penick
>            Assignee: Stefania
>            Priority: Minor
>             Fix For: 2.0.13, 2.1.4
>         Attachments: cassandra_7816.txt, tcpdump_repeating_status_change.txt, trunk-7816.txt
> Added "MOVED_NODE" as a possible type of topology change and also specified that it is
possible to receive the same event multiple times.

This message was sent by Atlassian JIRA

View raw message