cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node
Date Sat, 19 Sep 2015 22:16:04 GMT


Stefania commented on CASSANDRA-10231:

I've attached [a patch|] for 3.0 that
fixes the dtest, [~jkni] would you mind trying the patch with your Jepsen test? It will also
log a message for all GOSSIP entries so the log files may get a bit bigger but we will have
helpful information should the patch not work.

The patch basically fixes this exception, which causes any other GOSSIP properties applied
by {{onChange}} and following STATUS to be skipped:

ERROR [GossipStage:2] 2015-09-19 14:14:31,007 - Exception in thread
java.lang.NullPointerException: null
        at java.util.concurrent.ConcurrentHashMap.get( ~[na:1.8.0_60]
        at org.apache.cassandra.hints.HintsCatalog.get( ~[main/:na]
        at org.apache.cassandra.hints.HintsService.excise( ~[main/:na]
        at org.apache.cassandra.service.StorageService.excise( ~[main/:na]
        at org.apache.cassandra.service.StorageService.excise( ~[main/:na]
        at org.apache.cassandra.service.StorageService.handleStateLeft(
        at org.apache.cassandra.service.StorageService.onChange(
        at org.apache.cassandra.gms.Gossiper.doOnChangeNotifications( ~[main/:na]
        at org.apache.cassandra.gms.Gossiper.applyNewStates( ~[main/:na]
        at org.apache.cassandra.gms.Gossiper.applyStateLocally( ~[main/:na]
        at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(
        at ~[main/:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$
        at ~[na:1.8.0_60]

According to the additional GOSSIP trace message that I've added, host id was one such property.

> Null status entries on nodes that crash during decommission of a different node
> -------------------------------------------------------------------------------
>                 Key: CASSANDRA-10231
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joel Knighton
>            Assignee: Stefania
>             Fix For: 3.0.x
> This issue is reproducible through a Jepsen test of materialized views that crashes and
decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during the decommission
of a different node, it may start with a null entry for the decommissioned node like so:
> DN ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon a restart
of the affected node.
> This issue is further detailed in ticket [10068|].

This message was sent by Atlassian JIRA

View raw message