cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node
Date Tue, 08 Sep 2015 10:33:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734586#comment-14734586
] 

Stefania commented on CASSANDRA-10231:
--------------------------------------

This is not going to be easy to reproduce with a dtest, not without injecting some failure
into the code. So far I was able to see this interesting transition by issuing repeated nodetool
status commands during a decommission - but I was very lucky as I only saw it once out of
several times:

{code}
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving  
--  Address    Load       Tokens       Owns    Host ID                               Rack
UL  127.0.0.1  57.39 KB   256          ?       1b91a92c-58b7-470f-82eb-f1e05fc50636  rack1
UN  127.0.0.2  90.56 KB   256          ?       4287fd68-e53d-4b9e-a48b-af374f9e69b3  rack1
UN  127.0.0.3  52.56 KB   256          ?       35a94edb-b38a-4bf3-8318-e14bb8a59eef  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information
is meaningless

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving  
--  Address    Load       Tokens       Owns    Host ID                               Rack
UL  127.0.0.1  57.39 KB   256          ?       null                                  rack1
UN  127.0.0.2  90.56 KB   256          ?       4287fd68-e53d-4b9e-a48b-af374f9e69b3  rack1
UN  127.0.0.3  52.56 KB   256          ?       35a94edb-b38a-4bf3-8318-e14bb8a59eef  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information
is meaningless

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving  
--  Address    Load       Tokens       Owns    Host ID                               Rack
UN  127.0.0.2  90.56 KB   256          ?       4287fd68-e53d-4b9e-a48b-af374f9e69b3  rack1
UN  127.0.0.3  52.56 KB   256          ?       35a94edb-b38a-4bf3-8318-e14bb8a59eef  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information
is meaningless
{code}

Because of this observed transition, we know that at some point during the decomission the
host id must be null. That means it must be updated as null in {{system.peers}}. My assumption
was that if the node crashes when the host id is null in {{system.peers}} but before the entry
is removed entirely, this behavior might be observed. So I patched the C* code not to save
host id in system peers, and when I did I got this, which is close but not identical:

{code}
Final status from node 2
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving  
--  Address    Load       Tokens       Owns    Host ID                               Rack
UL  127.0.0.1  63.71 KB   256          ?       null                                  rack1
UN  127.0.0.2  102.39 KB  256          ?       c897de6b-9ec8-4fe2-9835-60bf812c0b22  rack1
{code}

I also saw this exception:

{code}
ERROR [GossipStage:1] 2015-09-08 18:10:22,590 CassandraDaemon.java:191 - Exception in thread
Thread[GossipStage:1,5,main]
java.lang.NullPointerException: null
        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) ~[na:1.8.0_60]
        at org.apache.cassandra.hints.HintsCatalog.get(HintsCatalog.java:85) ~[main/:na]
        at org.apache.cassandra.hints.HintsService.excise(HintsService.java:267) ~[main/:na]
        at org.apache.cassandra.service.StorageService.excise(StorageService.java:2129) ~[main/:na]
        at org.apache.cassandra.service.StorageService.excise(StorageService.java:2141) ~[main/:na]
        at org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2046)
~[main/:na]
        at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1660)
~[main/:na]
        at org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1191) ~[main/:na]
        at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1173) ~[main/:na]
        at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1130) ~[main/:na]
        at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
~[main/:na]
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) ~[main/:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_60]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
~[na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]
{code}

Here is the [wip dtest|https://github.com/stef1927/cassandra-dtest/commits/10231] but it only
works by changing the C* source code as follows:

{code}
stefi@lila:~/git/cstar/cassandra$ git diff
diff --git a/src/java/org/apache/cassandra/service/StorageService.java b/src/java/org/apache/cassandra/service/StorageService.java
index 2d9bbec..b84bcf5 100644
--- a/src/java/org/apache/cassandra/service/StorageService.java
+++ b/src/java/org/apache/cassandra/service/StorageService.java
@@ -1701,7 +1701,7 @@ public class StorageService extends NotificationBroadcasterSupport implements
IE
                         MigrationManager.instance.scheduleSchemaPull(endpoint, epState);
                         break;
                     case HOST_ID:
-                        SystemKeyspace.updatePeerInfo(endpoint, "host_id", UUID.fromString(value.value));
+                        //SystemKeyspace.updatePeerInfo(endpoint, "host_id", UUID.fromString(value.value));
                         break;
                     case RPC_READY:
                         notifyRpcChange(endpoint, epState.isRpcReady());
@@ -1741,7 +1741,7 @@ public class StorageService extends NotificationBroadcasterSupport implements
IE
                     SystemKeyspace.updatePeerInfo(endpoint, "schema_version", UUID.fromString(entry.getValue().value));
                     break;
                 case HOST_ID:
-                    SystemKeyspace.updatePeerInfo(endpoint, "host_id", UUID.fromString(entry.getValue().value));
+                    //SystemKeyspace.updatePeerInfo(endpoint, "host_id", UUID.fromString(entry.getValue().value));
                     break;
             }
         }
{code}

This code in {{SS.initServer()}} is suspect:

{code}
        if (Boolean.parseBoolean(System.getProperty("cassandra.load_ring_state", "true")))
        {
            logger.info("Loading persisted ring state");
            Multimap<InetAddress, Token> loadedTokens = SystemKeyspace.loadTokens();
            Map<InetAddress, UUID> loadedHostIds = SystemKeyspace.loadHostIds();
            for (InetAddress ep : loadedTokens.keySet())
            {
                if (ep.equals(FBUtilities.getBroadcastAddress()))
                {
                    // entry has been mistakenly added, delete it
                    SystemKeyspace.removeEndpoint(ep);
                }
                else
                {
                    if (loadedHostIds.containsKey(ep))
                        tokenMetadata.updateHostId(loadedHostIds.get(ep), ep);
                    Gossiper.instance.addSavedEndpoint(ep);
                }
            }
        }
{code}

The EP is added even when there is no host id so this might explain the problem but I still
need to investigate further.


> Null status entries on nodes that crash during decommission of a different node
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joel Knighton
>            Assignee: Stefania
>             Fix For: 3.0.x
>
>
> This issue is reproducible through a Jepsen test of materialized views that crashes and
decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during the decommission
of a different node, it may start with a null entry for the decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon a restart
of the affected node.
> This issue is further detailed in ticket [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message