Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB25C1768F for ; Fri, 10 Apr 2015 23:41:13 +0000 (UTC) Received: (qmail 42617 invoked by uid 500); 10 Apr 2015 23:41:13 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 42580 invoked by uid 500); 10 Apr 2015 23:41:13 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 42569 invoked by uid 99); 10 Apr 2015 23:41:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Apr 2015 23:41:13 +0000 Date: Fri, 10 Apr 2015 23:41:13 +0000 (UTC) From: "Brandon Williams (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-8072: ---------------------------------------- Attachment: 8072.txt Now we're getting somewhere. It starts here, after the seed receives the d= ead state for the decommissioned node: {noformat} DEBUG [GossipStage:1] 2015-04-10 22:05:10,147 ReconnectableSnitchHelper.jav= a (line 70) Intiated reconnect to an Internal IP /10.2.1.139 for the /54.21= 9.189.161 {noformat} Later, the seed receives the SYN and tries to send the ACK, but it tries to= send over the previous internal IP: {noformat} DEBUG [ACCEPT-/10.2.0.71] 2015-04-10 22:06:45,576 MessagingService.java (li= ne 917) Connection version 7 from /54.219.189.161 DEBUG [Thread-11] 2015-04-10 22:06:45,621 MessagingService.java (line 780) = Setting version 7 for /54.219.189.161 DEBUG [Thread-11] 2015-04-10 22:06:45,621 IncomingTcpConnection.java (line = 107) Set version for /54.219.189.161 to 7 (will use 7) TRACE [GossipStage:1] 2015-04-10 22:06:45,658 GossipDigestSynVerbHandler.ja= va (line 40) Received a GossipDigestSynMessage from /54.219.189.161 TRACE [GossipStage:1] 2015-04-10 22:06:45,660 Gossiper.java (line 768) loca= l heartbeat version 179776 greater than 0 for /54.219.189.161 TRACE [GossipStage:1] 2015-04-10 22:06:45,666 GossipDigestSynVerbHandler.ja= va (line 84) Sending a GossipDigestAckMessage to /54.219.189.161 TRACE [GossipStage:1] 2015-04-10 22:06:45,666 MessagingService.java (line 6= 60) /54.219.189.162 sending GOSSIP_DIGEST_ACK to 399@/54.219.189.161 DEBUG [WRITE-/54.219.189.161] 2015-04-10 22:06:45,666 OutboundTcpConnection= .java (line 290) attempting to connect to /10.2.1.139 {noformat} It seems like the 'new' 161 isn't binding this IP, which is fine depending = on your circumstance, but at least one problem we have is we shouldn't be s= ending the onJoin event for a dead state which triggers the initial reconne= ct. I can't think of any reason we'd want to send that event upon discover= y of any dead state, so patch to only send it for live states. That said, I don't think this is the original cause, because when I've seen= it I wasn't using INTERNAL_IP nor a reconnecting snitch. > Exception during startup: Unable to gossip with any seeds > --------------------------------------------------------- > > Key: CASSANDRA-8072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 > Project: Cassandra > Issue Type: Bug > Reporter: Ryan Springer > Assignee: Brandon Williams > Fix For: 2.0.15, 2.1.5 > > Attachments: 8072.txt, cas-dev-dt-01-uw1-cassandra-seed01_logs.ta= r.bz2, cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, cas-dev-dt-01-uw1-c= assandra02_logs.tar.bz2, casandra-system-log-with-assert-patch.log, trace_l= ogs.tar.bz2 > > > When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 clus= ter in either ec2 or locally, an error occurs sometimes with one of the nod= es refusing to start C*. The error in the /var/log/cassandra/system.log is= : > ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exce= ption encountered during startup > java.lang.RuntimeException: Unable to gossip with any seeds > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:= 1200) > at org.apache.cassandra.service.StorageService.checkForEndpointCo= llision(StorageService.java:444) > at org.apache.cassandra.service.StorageService.prepareToJoin(Stor= ageService.java:655) > at org.apache.cassandra.service.StorageService.initServer(Storage= Service.java:609) > at org.apache.cassandra.service.StorageService.initServer(Storage= Service.java:502) > at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDa= emon.java:378) > at org.apache.cassandra.service.CassandraDaemon.activate(Cassandr= aDaemon.java:496) > at org.apache.cassandra.service.CassandraDaemon.main(CassandraDae= mon.java:585) > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java = (line 1279) Announcing shutdown > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingServi= ce.java (line 701) Waiting for messaging service to quiesce > INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingServi= ce.java (line 941) MessagingService has terminated the accept() thread > This errors does not always occur when provisioning a 2-node cluster, but= probably around half of the time on only one of the nodes. I haven't been= able to reproduce this error with DSC 2.0.9, and there have been no code o= r definition file changes in Opscenter. > I can reproduce locally with the above steps.=E2=80=82 I'm happy to test = any proposed fixes since I'm the only person able to reproduce reliably so = far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)