cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-13700) Heartbeats can cause gossip information to go permanently missing on certain nodes
Date Wed, 19 Jul 2017 21:56:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Knighton updated CASSANDRA-13700:
--------------------------------------
    Description: 
In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}} from the corresponding
{{EndpointState}} to the {{EndpointState}} to send. When we're getting state for ourselves,
this means that we add a reference to the local {{HeartBeatState}}. Then, once we've built
a message (in either the Syn or Ack handler), we send it through the {{MessagingService}}.
In the case that the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may run
before serialization of the Syn or Ack. This means that when the {{GossipTask}} acquires the
gossip {{taskLock}}, it may increment the {{HeartBeatState}} version of the local node as
stored in the endpoint state map. Then, when we finally serialize the Syn or Ack, we'll follow
the reference to the {{HeartBeatState}} and serialize it with a higher version than we saw
when constructing the Ack or Ack2.

Consider the case where we see {{HeartBeatState}} with version 4 when constructing an Ack
and send it through the {{MessagingService}}. Then, we add some piece of state with version
5 to our local {{EndpointState}}. If {{GossipTask}} runs and increases the {{HeartBeatState}}
version to 6 before the {{MessageOut}} containing the Ack is serialized, the node receiving
the Ack will believe it is current to version 6, despite the fact that it has never received
a message containing the {{ApplicationState}} tagged with version 5.

I've reproduced in this in several versions; so far, I believe this is possible in all versions.

  was:
In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}} from the corresponding
{{EndpointState}} to the {{EndpointState}} to send. When we're getting state for ourselves,
this means that we add a reference to the local {{HeartBeatState}}. Then, once we've built
a message (in either the Syn or Ack handler), we send it through the {{MessagingService}}.
In the case that the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may run
before serialization of the Syn or Ack. This means that when the {{GossipTask}} acquires the
gossip {{taskLock}}, it may increment the {{HeartBeatState}} version of the local node as
stored in the endpoint state map. Then, when we finally serialize the Syn or Ack, we'll follow
the reference to the {{HeartBeatState}} and serialize it with a higher version than we saw
when constructing the Ack or Ack2.

Consider the case where we see {{HeartBeatState}} with version 4 when constructing an Ack
and send it through the {{Messaging Service}}. Then, we add some piece of state with version
5 to our local {{EndpointState}}. If {{GossipTask}} runs and increases the {{HeartBeatState}}
version to 6 before the {{MessageOut}} containing the Ack is serialized, the node receiving
the Ack will believe it is current to version 6, despite the fact that it has never received
a message containing the {{ApplicationState}} tagged with version 5.

I've reproduced in this in several versions; so far, I believe this is possible in all versions.


> Heartbeats can cause gossip information to go permanently missing on certain nodes
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13700
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13700
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>            Reporter: Joel Knighton
>            Assignee: Joel Knighton
>            Priority: Critical
>
> In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}} from the
corresponding {{EndpointState}} to the {{EndpointState}} to send. When we're getting state
for ourselves, this means that we add a reference to the local {{HeartBeatState}}. Then, once
we've built a message (in either the Syn or Ack handler), we send it through the {{MessagingService}}.
In the case that the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may run
before serialization of the Syn or Ack. This means that when the {{GossipTask}} acquires the
gossip {{taskLock}}, it may increment the {{HeartBeatState}} version of the local node as
stored in the endpoint state map. Then, when we finally serialize the Syn or Ack, we'll follow
the reference to the {{HeartBeatState}} and serialize it with a higher version than we saw
when constructing the Ack or Ack2.
> Consider the case where we see {{HeartBeatState}} with version 4 when constructing an
Ack and send it through the {{MessagingService}}. Then, we add some piece of state with version
5 to our local {{EndpointState}}. If {{GossipTask}} runs and increases the {{HeartBeatState}}
version to 6 before the {{MessageOut}} containing the Ack is serialized, the node receiving
the Ack will believe it is current to version 6, despite the fact that it has never received
a message containing the {{ApplicationState}} tagged with version 5.
> I've reproduced in this in several versions; so far, I believe this is possible in all
versions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message