Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F5DE173BB for ; Wed, 28 Jan 2015 13:33:35 +0000 (UTC) Received: (qmail 96239 invoked by uid 500); 28 Jan 2015 13:33:35 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 96189 invoked by uid 500); 28 Jan 2015 13:33:35 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 96177 invoked by uid 99); 28 Jan 2015 13:33:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jan 2015 13:33:35 +0000 Date: Wed, 28 Jan 2015 13:33:35 +0000 (UTC) From: "Ryan Springer (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 14295134#comment-14295134 ]=20 Ryan Springer commented on CASSANDRA-8072: ------------------------------------------ No problem with all the questions. The more information we have on this is= sue, the better. First Startup: - The DSC deb/rpm packages are installed by the agent. Part of the scripts= in the deb/rpm automatically starts DSC when the package is installed. - No changes are made to cassandra.yaml before this initial start from the = packaged scripts. - Initially the nodes are not aware of each other as seeds, because the cas= sandra.yaml being used is the one from the package. - The initial install is made in parallel in batches of 20 nodes at a time = ( configurable with the Opscenter install_throttle parameter. ) However, = I am seeing the problem with just 2 nodes in the cluster, so I don't think = the throttle is involved. - I will do a run of 2 nodes and post the cassandra.yaml files. Stopping: - The nodes are stopped in parallel - It looks as though Opscenter waits for the "apt-get install" or equivalen= t rpm command to return from the DSC package installation and then Opscente= r considers the node to be initially started. Once the package install com= mands have finished for all nodes, then Opscenter begins to stop all of the= DSC instances. If the package install command returns before DSC is compl= etely initialized, that could be related to this issue. - The nodes are stopped with: pkill -f CassandraDaemon Starting again - The DSC nodes are restarted serially, with the seed nodes being started b= efore non-seed nodes. The seeds are first sorted by string comparison and = then started one at a time in that order. - Opscenter will wait for all DSC instances to have been started, then it w= ill restart the agents, wait for them to reconnect to Opscenter, and then O= pscenter considers the provisioning to be finished. - I will grab 2 cassandra.yaml configs for this stage as well. >From my reading of the code, I believe the ec2 nodes will refer to each oth= er using public IPs, but I will verify from a real run. > Exception during startup: Unable to gossip with any seeds > --------------------------------------------------------- > > Key: CASSANDRA-8072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 > Project: Cassandra > Issue Type: Bug > Reporter: Ryan Springer > Assignee: Brandon Williams > Attachments: casandra-system-log-with-assert-patch.log > > > When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 clus= ter in either ec2 or locally, an error occurs sometimes with one of the nod= es refusing to start C*. The error in the /var/log/cassandra/system.log is= : > ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exce= ption encountered during startup > java.lang.RuntimeException: Unable to gossip with any seeds > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:= 1200) > at org.apache.cassandra.service.StorageService.checkForEndpointCo= llision(StorageService.java:444) > at org.apache.cassandra.service.StorageService.prepareToJoin(Stor= ageService.java:655) > at org.apache.cassandra.service.StorageService.initServer(Storage= Service.java:609) > at org.apache.cassandra.service.StorageService.initServer(Storage= Service.java:502) > at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDa= emon.java:378) > at org.apache.cassandra.service.CassandraDaemon.activate(Cassandr= aDaemon.java:496) > at org.apache.cassandra.service.CassandraDaemon.main(CassandraDae= mon.java:585) > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java = (line 1279) Announcing shutdown > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingServi= ce.java (line 701) Waiting for messaging service to quiesce > INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingServi= ce.java (line 941) MessagingService has terminated the accept() thread > This errors does not always occur when provisioning a 2-node cluster, but= probably around half of the time on only one of the nodes. I haven't been= able to reproduce this error with DSC 2.0.9, and there have been no code o= r definition file changes in Opscenter. > I can reproduce locally with the above steps.=E2=80=82 I'm happy to test = any proposed fixes since I'm the only person able to reproduce reliably so = far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)