cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13407) test failure at RemoveTest.testBadHostId
Date Thu, 06 Apr 2017 17:01:41 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959301#comment-15959301
] 

Joel Knighton commented on CASSANDRA-13407:
-------------------------------------------

For posterity, this is the race possible when the Gossiper is started, as far as I can tell.

In setup, we initialize a fake ring using Util.createInitialRing. This will intialize the
nodes in an unsafe manner and then inject the token states. If a status check runs before
the tokens state is set, the previously decommissioned node will look like a fat client, since
it won't have tokens and will not have a DEAD_STATE. Since we aren't gossiping, we won't have
heard from it in greater than fatClientTimeout, so we'll remove it. If this races with the
ss.onChange in createInitialRing, we can remove the endpointstate while processing it, which
will cause a NPE as above. This race can be seen at 16:15:51,205 in the log linked from the
test failure.

We also need to remove SchemaLoader.loadSchema() as you did in the patch - this is because
it starts the Gossiper as well. This is fine; we don't appear to need it.

The patch looks good - the race exists in theory on 2.1/2.2, but it appears to only manifest
on 3.0+. I don't think it is worth committing to 2.1 for that reason - let's do 2.2+ forward
and run the test at least once on each branch before committing.



> test failure at RemoveTest.testBadHostId
> ----------------------------------------
>
>                 Key: CASSANDRA-13407
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13407
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alex Petrov
>            Assignee: Alex Petrov
>
> Example trace:
> {code}
> java.lang.NullPointerException
> 	at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:881)
> 	at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:876)
> 	at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2201)
> 	at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1855)
> 	at org.apache.cassandra.Util.createInitialRing(Util.java:216)
> 	at org.apache.cassandra.service.RemoveTest.setup(RemoveTest.java:89)
> {code} 
> [failure example|https://cassci.datastax.com/job/trunk_testall/1491/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/]
> [history|https://cassci.datastax.com/job/trunk_testall/lastCompletedBuild/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/history/]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message