Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CBE7E10AD7 for ; Mon, 24 Feb 2014 10:44:01 +0000 (UTC) Received: (qmail 89506 invoked by uid 500); 24 Feb 2014 10:43:59 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 88862 invoked by uid 500); 24 Feb 2014 10:43:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 88843 invoked by uid 99); 24 Feb 2014 10:43:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Feb 2014 10:43:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Ignace.Desimpel@nuance.com designates 198.71.66.80 as permitted sender) Received: from [198.71.66.80] (HELO som-mx-a.nuance.com) (198.71.66.80) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Feb 2014 10:43:49 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AucEAHghC1MKHBQY/2dsb2JhbABWA4JCf1fAaIEqdIIlAQEBBC0yKgIBCA0EBAEBCx0HMhQHAQEFAwEBBBMIzhsXjjMhFgEGC4MTgRQEnySOZIIq Received: from unknown (HELO SOM-CAS01.nuance.com) ([10.28.20.24]) by som-mx-a.nuance.com with ESMTP/TLS/AES128-SHA; 24 Feb 2014 05:43:26 -0500 Received: from SOM-CAS03.nuance.com (10.28.20.26) by SOM-CAS01.nuance.com (10.28.20.24) with Microsoft SMTP Server (TLS) id 14.3.174.1; Mon, 24 Feb 2014 05:43:25 -0500 Received: from SOM-EXCH02.nuance.com ([fe80::4992:8492:7315:6160]) by SOM-CAS03.nuance.com ([::1]) with mapi id 14.03.0174.001; Mon, 24 Feb 2014 05:43:26 -0500 From: "Desimpel, Ignace" To: "user@cassandra.apache.org" Subject: FW: Sporadic gossip exception on add node Thread-Topic: Sporadic gossip exception on add node Thread-Index: Ac8eiT0ivAVufh86TcmjIiqmalDPgASwec3A Date: Mon, 24 Feb 2014 10:43:24 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.28.16.110] Content-Type: multipart/alternative; boundary="_000_FCD5C460700DCA4C8CEB1730307336020784A20ASOMEXCH02nuance_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_FCD5C460700DCA4C8CEB1730307336020784A20ASOMEXCH02nuance_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Had a look at the code, and this might be a race-condition like problem at = the function StorageService::checkForEndpointCollision and StorageService::= prepareReplacementInfo To do a Gossiper.instance.doShadowRound(), the MessagingService.instance().= listen(FBUtilities.getLocalAddress()) must be FULLY (accepting connections)= running. However , the listen function is starting SocketThread threads, but is not = waiting for these to be started. So I think, at least in theory, that the = doShadowRound function will be sending messages, thus excepting answers, bu= t there is no guarantee that the listeners are actually up and running. As a test I modified the MessagingService::listen code by SocketThread th =3D new SocketThread(ss, "ACCEPT-" + localEp); synchronized( th ) { th.start(); try { th.wait(); } catch(Throwable tt){} } And the SocketThread::run function public void run() { synchronized( this ) { this.notifyAll(); } That way there is little chance the socket thread is not running yet (shoul= d be blocked in the server.accept call() ). Regards, Ignace Desimpel From: Desimpel, Ignace [mailto:Ignace.Desimpel@nuance.com] Sent: donderdag 6 februari 2014 12:15 To: user@cassandra.apache.org Subject: Sporadic gossip exception on add node Environment : linux, cassandra 2.0.4, 3 node, embedded, byte ordered, LCS When I add a node to the existing 3 node cluster I sometimes get the except= ion 'Unable to gossip with any seeds ' listed below. If I just restart it w= ithout any change then mostly it works. Must be some timing issue. The Cassandra at that time is configured using the Cassandra.yaml file with the auto_bootstrap set true and the initial_token set to something like : 00f35256, 041e692a, 0562d8b2,= 0930274a, 0b16ce96, 0c5b3e1e, 10cac47a, 12b16bc6, 13f5db4e, 186561aa, 1907= 996e, 1c32b042, 1e19578e ...... The two seeds configured in this yaml are 10.164.8.250 and 10.164.8.249 and= these are up and running. The new node to add has ip 10.164.8.93 At the time of the exception, I do not get the gossip message 'Handshaking = version with /10.164.8.93' on the seeds. If the exception does not occurs, then I do get that gossip message 'Handsh= aking version with /10.164.8.93' on the seed 2014-01-31 13:40:36.380 Loading persisted ring state 2014-01-31 13:40:36.386 Starting Messaging Service on port 9804 2014-01-31 13:40:36.408 Handshaking version with /10.164.8.250 2014-01-31 13:40:36.408 Handshaking version with /10.164.8.249 2014-01-31 13:41:07.415 Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper= .java:1160) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] at org.apache.cassandra.service.StorageService.checkForEndp= ointCollision(StorageService.java:426) ~[apache-cassandra-2.0.4-SNAPSHOT.ja= r:2.0.4-SNAPSHOT] at org.apache.cassandra.service.StorageService.joinTokenRin= g(StorageService.java:618) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAP= SHOT] at org.apache.cassandra.service.StorageService.initServer(S= torageService.java:586) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHO= T] at org.apache.cassandra.service.StorageService.initServer(S= torageService.java:485) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHO= T] at org.apache.cassandra.service.CassandraDaemon.setup(Cassa= ndraDaemon.java:346) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] at org.apache.cassandra.service.CassandraDaemon.activate(Ca= ssandraDaemon.java:461) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHO= T] at be.landc.services.search.server.db.baseserver.indexsearc= h.store.cassandra.CassandraStore$CassThread.startUpCassandra(CassandraStore= .java:469) [landc-services-search-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT-87937] at be.landc.services.search.server.db.baseserver.indexsearc= h.store.cassandra.CassandraStore$CassThread.run(CassandraStore.java:460) [l= andc-services-search-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT-87937] java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper= .java:1160) at org.apache.cassandra.service.StorageService.checkForEndp= ointCollision(StorageService.java:426) at org.apache.cassandra.service.StorageService.joinTokenRin= g(StorageService.java:618) at org.apache.cassandra.service.StorageService.initServer(S= torageService.java:586) at org.apache.cassandra.service.StorageService.initServer(S= torageService.java:485) at org.apache.cassandra.service.CassandraDaemon.setup(Cassa= ndraDaemon.java:346) at org.apache.cassandra.service.CassandraDaemon.activate(Ca= ssandraDaemon.java:461) at be.landc.services.search.server.db.baseserver.indexsearc= h.store.cassandra.CassandraStore$CassThread.startUpCassandra(CassandraStore= .java:469) at be.landc.services.search.server.db.baseserver.indexsearc= h.store.cassandra.CassandraStore$CassThread.run(CassandraStore.java:460) Exception encountered during startup: Unable to gossip with any seeds 2014-01-31 13:41:07.419 Exception in thread Thread[StorageServiceShutdownHo= ok,5,main] java.lang.NullPointerException: null at org.apache.cassandra.service.StorageService.stopNativeTr= ansport(StorageService.java:349) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.= 4-SNAPSHOT] at org.apache.cassandra.service.StorageService.shutdownClie= ntServers(StorageService.java:364) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.= 0.4-SNAPSHOT] at org.apache.cassandra.service.StorageService.access$3(Sto= rageService.java:361) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] at org.apache.cassandra.service.StorageService$1.runMayThro= w(StorageService.java:551) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAP= SHOT] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRu= nnable.java:28) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT] at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_40] 2014-01-31 13:41:07.420 ShutDownHook requests shutdown on be.landc.services= .cdi.server.cassandra.CDIServer@7c32d1a3 2014-01-31 13:41:07.421 Shutdown server request --_000_FCD5C460700DCA4C8CEB1730307336020784A20ASOMEXCH02nuance_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Had a look at the code= , and this might be a race-condition like problem at the function StorageSe= rvice::checkForEndpointCollision and StorageService::prepareReplacementInfo=

 

To do a Gossiper.insta= nce.doShadowRound(), the MessagingService.instance().listen(FBUtilities.get= LocalAddress()) must be FULLY (accepting connections) running.

However , the listen f= unction is starting SocketThread threads, but is not waiting for these to b= e started. So I think, at least in theory,  that the doShadowRound fun= ction will be sending messages, thus excepting answers, but there is no guarantee that the listeners are actually up and = running.

 

As a test I modified t= he MessagingService::listen code by

SocketThread th =3D ne= w SocketThread(ss, "ACCEPT-" + localEp);

   synchroni= zed( th ) {

   &nbs= p; th.start();

   &nbs= p; try { th.wait(); } catch(Throwable tt){}

   }

 

And the SocketThread::= run function

public void run()=

   {

   &nbs= p; synchronized( this ) {

   &nbs= p;   this.notifyAll();

   &nbs= p; }

 

That way there is litt= le chance the socket thread is not running yet (should be blocked in the se= rver.accept call() ).

 

 

Regards,

Ignace Desimpel

 

From: Desimpel= , Ignace [mailto:Ignace.Desimpel@nuance.com]
Sent: donderdag 6 februari 2014 12:15
To: user@cassandra.apache.org
Subject: Sporadic gossip exception on add node

 

Environment : linux, cassandra 2.0.4, 3 node, embedd= ed, byte ordered, LCS

 

When I add a node to the existing 3 node cluster I s= ometimes get the exception ‘Unable to gossip with any seeds ‘ l= isted below. If I just restart it without any change then mostly it works. = Must be some timing issue.

 

The Cassandra at that time is configured using the C= assandra.yaml file

with the auto_bootstrap set true

and the initial_token set to something like : 00f35256,= 041e692a, 0562d8b2, 0930274a, 0b16ce96, 0c5b3e1e, 10cac47a, 12b16bc6, 13f5= db4e, 186561aa, 1907996e, 1c32b042, 1e19578e ……

 

The two seeds configured in this yaml are 10.164.8.2= 50 and 10.164.8.249 and these are up and running.

The new node to add has ip 10.164.8.93

 

At the time of the exception, I do not get the gossip m= essage ‘Handshaking version with /10.164.8.93’ on the seeds.

If the exception does not occurs, then I do get that go= ssip message ‘Handshaking version with /10.164.8.93’ on the see= d

 

2014-01-31 13:40:36.380 Loading persisted ring state=

2014-01-31 13:40:36.386 Starting Messaging Service o= n port 9804

2014-01-31 13:40:36.408 Handshaking version with /10= .164.8.250

2014-01-31 13:40:36.408 Handshaking version with /10= .164.8.249

2014-01-31 13:41:07.415 Exception encountered during= startup

java.lang.RuntimeException: Unable to gossip with an= y seeds

        &nbs= p;       at org.apache.cassandra.gms.Gossiper= .doShadowRound(Gossiper.java:1160) ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.= 0.4-SNAPSHOT]

        &nbs= p;       at org.apache.cassandra.service.Stor= ageService.checkForEndpointCollision(StorageService.java:426) ~[apache-cass= andra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]

        &nbs= p;       at org.apache.cassandra.service.Stor= ageService.joinTokenRing(StorageService.java:618) ~[apache-cassandra-2.0.4-= SNAPSHOT.jar:2.0.4-SNAPSHOT]

        &nbs= p;       at org.apache.cassandra.service.Stor= ageService.initServer(StorageService.java:586) ~[apache-cassandra-2.0.4-SNA= PSHOT.jar:2.0.4-SNAPSHOT]

        &nbs= p;       at org.apache.cassandra.service.Stor= ageService.initServer(StorageService.java:485) ~[apache-cassandra-2.0.4-SNA= PSHOT.jar:2.0.4-SNAPSHOT]

        &nbs= p;       at org.apache.cassandra.service.Cass= andraDaemon.setup(CassandraDaemon.java:346) ~[apache-cassandra-2.0.4-SNAPSH= OT.jar:2.0.4-SNAPSHOT]

        &nbs= p;       at org.apache.cassandra.service.Cass= andraDaemon.activate(CassandraDaemon.java:461) ~[apache-cassandra-2.0.4-SNA= PSHOT.jar:2.0.4-SNAPSHOT]

        &nbs= p;       at be.landc.services.search.server.d= b.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.startUpC= assandra(CassandraStore.java:469) [landc-services-search-1.2.0-SNAPSHOT.jar= :1.2.0-SNAPSHOT-87937]

        &nbs= p;       at be.landc.services.search.server.d= b.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.run(Cass= andraStore.java:460) [landc-services-search-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSH= OT-87937]

java.lang.RuntimeException: Unable to gossip with an= y seeds

        &nbs= p;       at org.apache.cassandra.gms.Gossiper= .doShadowRound(Gossiper.java:1160)

        &nbs= p;       at org.apache.cassandra.service.Stor= ageService.checkForEndpointCollision(StorageService.java:426)

        &nbs= p;       at org.apache.cassandra.service.Stor= ageService.joinTokenRing(StorageService.java:618)

        &nbs= p;       at org.apache.cassandra.service.Stor= ageService.initServer(StorageService.java:586)

        &nbs= p;       at org.apache.cassandra.service.Stor= ageService.initServer(StorageService.java:485)

        &nbs= p;       at org.apache.cassandra.service.Cass= andraDaemon.setup(CassandraDaemon.java:346)

        &nbs= p;       at org.apache.cassandra.service.Cass= andraDaemon.activate(CassandraDaemon.java:461)

        &nbs= p;       at be.landc.services.search.server.d= b.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.startUpC= assandra(CassandraStore.java:469)

        &nbs= p;       at be.landc.services.search.server.d= b.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.run(Cass= andraStore.java:460)

Exception encountered during startup: Unable to goss= ip with any seeds

2014-01-31 13:41:07.419 Exception in thread Thread[S= torageServiceShutdownHook,5,main]

java.lang.NullPointerException: null

        &nbs= p;       at org.apache.cassandra.service.Stor= ageService.stopNativeTransport(StorageService.java:349) ~[apache-cassandra-= 2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]

        &nbs= p;       at org.apache.cassandra.service.Stor= ageService.shutdownClientServers(StorageService.java:364) ~[apache-cassandr= a-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]

        &nbs= p;       at org.apache.cassandra.service.Stor= ageService.access$3(StorageService.java:361) ~[apache-cassandra-2.0.4-SNAPS= HOT.jar:2.0.4-SNAPSHOT]

        &nbs= p;       at org.apache.cassandra.service.Stor= ageService$1.runMayThrow(StorageService.java:551) ~[apache-cassandra-2.0.4-= SNAPSHOT.jar:2.0.4-SNAPSHOT]

        &nbs= p;       at org.apache.cassandra.utils.Wrappe= dRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.0.4-SNAPSHOT.ja= r:2.0.4-SNAPSHOT]

        &nbs= p;       at java.lang.Thread.run(Thread.java:= 724) ~[na:1.7.0_40]

2014-01-31 13:41:07.420 ShutDownHook requests shutdo= wn on be.landc.services.cdi.server.cassandra.CDIServer@7c32d1a3

2014-01-31 13:41:07.421 Shutdown server request=

--_000_FCD5C460700DCA4C8CEB1730307336020784A20ASOMEXCH02nuance_--