cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Springer (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
Date Wed, 28 Jan 2015 13:33:35 GMT


Ryan Springer commented on CASSANDRA-8072:

No problem with all the questions.  The more information we have on this issue, the better.

First Startup:

- The DSC deb/rpm packages are installed by the agent.  Part of the scripts in the deb/rpm
automatically starts DSC when the package is installed.
- No changes are made to cassandra.yaml before this initial start from the packaged scripts.
- Initially the nodes are not aware of each other as seeds, because the cassandra.yaml being
used is the one from the package.
- The initial install is made in parallel in batches of 20 nodes at a time ( configurable
with the Opscenter install_throttle parameter.  )  However, I am seeing the problem with just
2 nodes in the cluster, so I don't think the throttle is involved.
- I will do a run of 2 nodes and post the cassandra.yaml files.


- The nodes are stopped in parallel
- It looks as though Opscenter waits for the "apt-get install" or equivalent rpm command to
return from the DSC package installation and then Opscenter considers the node to be initially
started.  Once the package install commands have finished for all nodes, then Opscenter begins
to stop all of the DSC instances.  If the package install command returns before DSC is completely
initialized, that could be related to this issue.
- The nodes are stopped with: pkill -f CassandraDaemon

Starting again

- The DSC nodes are restarted serially, with the seed nodes being started before non-seed
nodes.  The seeds are first sorted by string comparison and then started one at a time in
that order.
- Opscenter will wait for all DSC instances to have been started, then it will restart the
agents, wait for them to reconnect to Opscenter, and then Opscenter considers the provisioning
to be finished.
- I will grab 2 cassandra.yaml configs for this stage as well.

>From my reading of the code, I believe the ec2 nodes will refer to each other using public
IPs, but I will verify from a real run.

> Exception during startup: Unable to gossip with any seeds
> ---------------------------------------------------------
>                 Key: CASSANDRA-8072
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ryan Springer
>            Assignee: Brandon Williams
>         Attachments: casandra-system-log-with-assert-patch.log
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either
ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*.  The
error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 (line 513) Exception encountered
during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
>         at org.apache.cassandra.gms.Gossiper.doShadowRound(
>         at org.apache.cassandra.service.StorageService.checkForEndpointCollision(
>         at org.apache.cassandra.service.StorageService.prepareToJoin(
>         at org.apache.cassandra.service.StorageService.initServer(
>         at org.apache.cassandra.service.StorageService.initServer(
>         at org.apache.cassandra.service.CassandraDaemon.setup(
>         at org.apache.cassandra.service.CassandraDaemon.activate(
>         at org.apache.cassandra.service.CassandraDaemon.main(
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 (line 1279)
Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 (line
701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/] 2014-10-06 15:54:54,327 (line
941) MessagingService has terminated the accept() thread
> This errors does not always occur when provisioning a 2-node cluster, but probably around
half of the time on only one of the nodes.  I haven't been able to reproduce this error with
DSC 2.0.9, and there have been no code or definition file changes in Opscenter.
> I can reproduce locally with the above steps.  I'm happy to test any proposed fixes
since I'm the only person able to reproduce reliably so far.

This message was sent by Atlassian JIRA

View raw message