cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russ Hatch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
Date Tue, 27 Jan 2015 23:00:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294364#comment-14294364
] 

Russ Hatch commented on CASSANDRA-8072:
---------------------------------------

I have been able to reproduce this issue using the steps provided in CASSANDRA-8422.

During one repro attempt I tried to run 'nodetool gossipinfo' in a loop on the non-seed node
(just before the reported exception was expected to occur), and was surprised to see it complain
about attempting to use a closed connection.

Using netstat I had a look at the seed and the non-seed node, and can see a lingering CLOSE_WAIT
connection on the seed node -- I'm wondering if cassandra could somehow be trying to reuse
this stale connection, making the seed unable to connect back to the non-seed (and making
it think the seed is unavailable).

It may also be relevant that the non-seed node has a connection in state FIN_WAIT2 for approx.
10-15 seconds after stopping the cassandra process.

> Exception during startup: Unable to gossip with any seeds
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-8072
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ryan Springer
>            Assignee: Brandon Williams
>         Attachments: casandra-system-log-with-assert-patch.log
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either
ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*.  The
error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exception encountered
during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
>         at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
>         at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
>         at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java (line 1279)
Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingService.java (line
701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingService.java (line
941) MessagingService has terminated the accept() thread
> This errors does not always occur when provisioning a 2-node cluster, but probably around
half of the time on only one of the nodes.  I haven't been able to reproduce this error with
DSC 2.0.9, and there have been no code or definition file changes in Opscenter.
> I can reproduce locally with the above steps.  I'm happy to test any proposed fixes
since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message