cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal
Date Tue, 20 Oct 2015 02:00:32 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964380#comment-14964380
] 

Stefania commented on CASSANDRA-10089:
--------------------------------------

We managed to reproduce the issue of missing tokens in status normal again with [this failed
test|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10089-2.2-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_reversed_test/]
and log information at TRACE level for Gossiper. I've replaced the files attached to this
ticket with the log files for this latest test. The ERROR occurs in node 1 because it gets
status NORMAL but no tokens for node 2 from node 2, at around 09:31:16,086.

The problem is the high scale lib {{NonBlockingHashMap}} in {{EndpointState}}. Even if we
are careful to add the tokens before the status, sometimes the gossip thread gets status normal
but no tokens. I've reproduced this several times on my machine with [this unit test|https://github.com/stef1927/cassandra/commit/275564fa568f47bb136c13e38ad918c4c4fcb944#diff-9c186d237f8b9eda310c20fc4a8c314bR41].

I'm not so sure if it's OK to replace {{NonBlockingHashMap}} with {{ConcurrentHashMap}} since
this would have performance impacts. Alternatively we could see if there is a later version
of {{NonBlockingHashMap}} or a different implementation of a hash map that is thread safe
and that guarantees that if we see a value when iterating, then we see all values inserted
or modified before this value. cc [~brandon.williams] for his knowledge on Gossip and [~benedict]
for his knowledge on hash map implementations.

> NullPointerException in Gossip handleStateNormal
> ------------------------------------------------
>
>                 Key: CASSANDRA-10089
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10089
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 2.1.x, 2.2.x, 3.0.x
>
>         Attachments: node1_debug.log, node2_debug.log, node3_debug.log
>
>
> Whilst comparing dtests for CASSANDRA-9970 I found [this failing dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/]
in 2.2:
> {code}
> Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 15:39:57,873 CassandraDaemon.java:183
- Exception in thread Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat
org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731)
~[main/:na] \tat org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804)
~[main/:na] \tat org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857)
~[main/:na] \tat org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629)
~[main/:na] \tat org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312)
~[main/:na] \tat org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025)
~[main/:na] \tat org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) ~[main/:na]
\tat org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
~[main/:na] \tat org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
~[main/:na] \tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[na:1.7.0_80] \tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]']
> {code}
> I wasn't able to find it on unpatched branches  but it is clearly not related to CASSANDRA-9970,
if anything it could have been a side effect of CASSANDRA-9871.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message