cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10205) decommissioned_wiped_node_can_join_test fails on Jenkins
Date Wed, 09 Sep 2015 02:56:45 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736048#comment-14736048
] 

Stefania edited comment on CASSANDRA-10205 at 9/9/15 2:56 AM:
--------------------------------------------------------------

Third CI run also successful. We need a reviewer for the C* patch. 

Here is a recap: the fix for the dtest is to add {{wait_other_notice}} when stopping the node
after decommissioning (else the test would be flacky). We also need the C* patch to mark the
node as dead when stopping a decommissioned node or else:
* {{wait_other_notice}} will hang because the {{is now DOWN}} notification is missing from
the logs and 
* the sockets between processes are not closed so when the node is restarted it doesn't receive
GOSSIP replies.

Once the review is OK we need to back-port the C* patch to 2.0+ since the test fails on all
branches.


was (Author: stefania):
Third CI run also successful. We need a reviewer for the C* patch. 

Here is a recap: the fix for the dtest is to add {{wait_other_notice}} when stopping the node
after decommissioning (else the test would be flacky). We also need the C* patch to mark the
node as dead when stopping a decommissioned node or else:
* {{wait_other_notice}] will hang because the {{is now DOWN}} notification is missing from
the logs and 
* the sockets between processes are not closed so when the node is restarted it doesn't receive
GOSSIP replies.

Once the review is OK we need to back-port the C* patch to 2.0+ since the test fails on all
branches.

> decommissioned_wiped_node_can_join_test fails on Jenkins
> --------------------------------------------------------
>
>                 Key: CASSANDRA-10205
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10205
>             Project: Cassandra
>          Issue Type: Test
>            Reporter: Stefania
>            Assignee: Stefania
>         Attachments: decommissioned_wiped_node_can_join_test.tar.gz
>
>
> This test passes locally but reliably fails on Jenkins. It seems after we restart node4,
it is unable to Gossip with other nodes:
> {code}
> INFO  [HANDSHAKE-/127.0.0.2] 2015-08-27 06:50:42,778 OutboundTcpConnection.java:494 -
Handshaking version with /127.0.0.2
> INFO  [HANDSHAKE-/127.0.0.1] 2015-08-27 06:50:42,778 OutboundTcpConnection.java:494 -
Handshaking version with /127.0.0.1
> INFO  [HANDSHAKE-/127.0.0.3] 2015-08-27 06:50:42,778 OutboundTcpConnection.java:494 -
Handshaking version with /127.0.0.3
> ERROR [main] 2015-08-27 06:51:13,785 CassandraDaemon.java:635 - Exception encountered
during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
>         at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1342) ~[main/:na]
>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:518)
~[main/:na]
>         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:763)
~[main/:na]
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:687)
~[main/:na]
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:570)
~[main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:320)
[main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516)
[main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622)
[main/:na]
> WARN  [StorageServiceShutdownHook] 2015-08-27 06:51:13,799 Gossiper.java:1453 - No local
state or state is in silent shutdown, not announcing shutdown
> {code}
> It seems both the addresses and port number of the seeds are correct so I don't think
the problem is the Amazon private addresses but I might be wrong. 
> It's also worth noting that the first time the node starts up without problems. The problem
only occurs during a restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message