Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 83619 invoked from network); 9 Feb 2011 22:32:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Feb 2011 22:32:26 -0000 Received: (qmail 67471 invoked by uid 500); 9 Feb 2011 22:32:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 67425 invoked by uid 500); 9 Feb 2011 22:32:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 67417 invoked by uid 99); 9 Feb 2011 22:32:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Feb 2011 22:32:22 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a44.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Feb 2011 22:32:17 +0000 Received: from homiemail-a44.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a44.g.dreamhost.com (Postfix) with ESMTP id 5793E118057 for ; Wed, 9 Feb 2011 14:31:56 -0800 (PST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=to:from :subject:date:message-id:content-type:mime-version:in-reply-to; q=dns; s=thelastpickle.com; b=XK2OZWcxMIuRxu5MI5lbh6zsPryqiOogJ yWKD4mLV+3GPTxup+DVlY1nLSwOj2OWQGj40KeQTc90dPeau4LNcOU7WjnvNk+4W 882E1Kp+/Ax6QWbGO36/uSl96viYEO/Z67PNH5U/6+iJMbQaQwBNgNSB3ZwX1l7n WhoroCRPQk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=to :from:subject:date:message-id:content-type:mime-version: in-reply-to; s=thelastpickle.com; bh=qjLWvcnarsaYTOPl9UGWpEzLBOg =; b=hQQ2p6BrLhgqlNmrN0KCQrOjG7Xnnyw+P4WydUUaAjranXo3YluVGp5VnY9 F+hl6wnP7DKgMyQej3CuWt2A8TBdvkyylFIxbRFccnL+XOM8Udxcnlf7L2kzZ1Mj op3N/FOfVJZ3fGrKtBa9VFVW9lM3NgbmUnayYcgGWhvqEdXQ= Received: from localhost (webms.mac.com [17.148.16.116]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a44.g.dreamhost.com (Postfix) with ESMTPSA id 3E3E1118056 for ; Wed, 9 Feb 2011 14:31:56 -0800 (PST) To: user@cassandra.apache.org From: Aaron Morton Subject: Re: ApplicationState Schema has drifted from DatabaseDescriptor Date: Wed, 09 Feb 2011 22:31:54 GMT X-Mailer: MobileMe Mail (1C3224) Message-id: <39cdf19d-3c1d-4a27-cb04-b7acfc494364@me.com> Content-Type: multipart/alternative; boundary=Apple-Webmail-42--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7 MIME-Version: 1.0 In-Reply-To: --Apple-Webmail-42--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=ISO-8859-1; format=flowed Thanks Gary.=A0I'll keep an eye on things and see if it happens again.=A0=0A= =0AFrom reading the code I'm wondering if there is a small chance of a rac= e condition in HintedHandoffManager.waitForSchemaAgreement() .=0A=0ACould = the following happen? I'm a little unsure on exactly how the endpoint stat= e is removed from the map in Gossiper.=0A=0A1) node 1 starts=0A2) Gossiper= calls StorageService.onAlive() when the endpoints are detected as alive.=0A= 3) HintedHandoffManager.deliverHints() adds a runnable to the HintedHandof= f TP=0A4) This happens several times, and node 1 gets busy delivering hint= s but there is only 1 thread in the thread pool.=0A5) Node n is removed fr= om the cluster and the endpoint state is deleted in the Gossiper on node 1= =A0=0A6) Node 1 gets around to processing the hints for node n and Gossipe= r.getEndpointStateForEndpoint() returns null for node n=0A=0A=0AThanks=0AA= aron=0A=0AOn 10 Feb, 2011,at 03:03 AM, Gary Dusbabek = wrote:=0A=0AAaron,=0A=0AIt looks like you're experiencing a side-effect o= f CASSANDRA-2083.=0AThere was at least one place (when node B received upd= ated schema from=0Anode A) where gossip was not being updated with the cor= rect schema=0Aeven though DatabaseDescriptor had the right version. I'm pr= etty sure=0Athis is what you're seeing.=0A=0AGary.=0A=0A=0AOn Wed, Feb 9, = 2011 at 00:08, Aaron Morton wrote:=0A> I noticed= this after I upgraded one node in a 0.7 cluster of 5 to the latest=0A> st= able 0.7 build "2011-02-08_20-41-25" (upgraded =A0node was jb-cass1 below)= =0A> This is a long email, you can jump to the end and help me out by che= cking=0A> something on your =A007 cluster.=0A> This is the value from o.a.= c.gms.FailureDetector.AllEndpointStates on=0A> jb-cass05 9114.67)=0A> /192= 168.114.63 =A0 X3:2011-02-08_20-41-25=0A> SCHEMA:2f555eb0-3332-11e0-9e8d-= c4f8bbf76455 =A0 LOAD:2.84182972E8=0A> STATUS:NORMAL,0=0A> /192.168.114.64= =A0 SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455=0A> LOAD:2.84354156E8 =A0= STATUS:NORMAL,34028236692093846346337460743176821145=0A> /192.168.114.66 = =A0 SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455=0A> LOAD:2.59171601E8 =A0 = STATUS:NORMAL,102084710076281539039012382229530463435=0A> /192.168.114.65 = =A0 SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455=0A> LOAD:2.70907168E8 =A0 = STATUS:NORMAL,68056473384187692692674921486353642290=0A> jb08.wetafx.co.nz= /192.168.114.67=0A> SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455 =A0 LOAD:1= 155260665E9=0A> STATUS:NORMAL,136112946768375385385349842972707284580=0A>= Notice the schema for nodes 63 and 64 starts with 2f55 and for 65, 66 and= 67=0A> it starts with 075.=0A> This is the output from pycassa calling de= scribe_versions when connected to=0A> both the 63 (jb-cass1) and 67 (jb-ca= ss5) nodes=0A> In [34]: sys.describe_schema_versions()=0A> Out[34]:=0A> {'= 2f555eb0-3332-11e0-9e8d-c4f8bbf76455': ['192.168.114.63',=0A> =A0=A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0'1= 92.168.114.64',=0A> =A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0'192.168.114.65',=0A> =A0=A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0'192.168.1= 14.66',=0A> =A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0'192.168.114.67']}=0A> It's reporting all nodes on= the 2f55 schema. The SchemaCheckVerbHandler is=0A> getting the value from= DatabaseDescriptor. FailureDetector MBean is getting=0A> them from Gossip= er.endpointStateMap .=A0Requests are working though, so the=0A> CFid's mus= t be matching up.=0A> Commit=A0https://github.com/apache/cassandra/commit/= ecbd71f6b4bb004d26e585ca8a7e642436a5c1a4=A0added=0A> code to the 0.7 branc= h in the HintedHandOffManager to check the schema=0A> versions of nodes it= has hints for. This is now failing on the new node as=0A> follows...=0A> = ERROR [HintedHandoff:1] 2011-02-09 16:11:23,559 AbstractCassandraDaemon.ja= va=0A> (line=0A> org.apache.cassandra.service.AbstractCassandraDaemon$1.un= caughtException(AbstractCassandraDaemon.java:114))=0A> Fatal exception in = thread Thread[HintedHandoff:1,1,main]=0A> java.lang.RuntimeException: java= lang.RuntimeException: Could not reach=0A> schema agreement with /192.168= 114.64 in 60000ms=0A> =A0=A0 =A0 =A0 =A0at=0A> org.apache.cassandra.utils= WrappedRunnable.run(WrappedRunnable.java:34)=0A> =A0=A0 =A0 =A0 =A0at=0A>= java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= java:886)=0A> =A0=A0 =A0 =A0 =A0at=0A> java.util.concurrent.ThreadPoolExe= cutor$Worker.run(ThreadPoolExecutor.java:908)=0A> =A0=A0 =A0 =A0 =A0at jav= a.lang.Thread.run(Thread.java:619)=0A> Caused by: java.lang.RuntimeExcepti= on: Could not reach schema agreement with=0A> /192.168114.64 in 60000ms=0A= > =A0=A0 =A0 =A0 =A0at=0A> org.apache.cassandra.db.HintedHandOffManager.wa= itForSchemaAgreement(HintedHandOffManager.java:256)=0A> =A0=A0 =A0 =A0 =A0= at=0A> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint= (HintedHandOffManager.java:267)=0A> =A0=A0 =A0 =A0 =A0at=0A> org.apache.ca= ssandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)=0A= > =A0=A0 =A0 =A0 =A0at=0A> org.apache.cassandra.db.HintedHandOffManager$2.= runMayThrow(HintedHandOffManager.java:391)=0A> =A0=A0 =A0 =A0 =A0at=0A> or= g.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)=0A> = =A0=A0 =A0 =A0 =A0... 3 more=0A> (the nodes can all see each other, checke= d with notetool during the 60=0A> seconds)=0A> If I restart one of the nod= es with the 075 schema (without upgrading it) it=0A> reads the schema from= the system tables and goes back to the 2f55 schema.=0A> e.g. the 64 node = was also on the 075 schema, I restarted and it moved to the=0A> 2f55 and l= ogged appropriately. While writing this email I checked again with=0A> the= 65 node, and the schema if was reporting to other nodes changed after a=0A= > restart from 075 to 2f55=0A> INFO [main] 2011-02-09 17:17:11,457 Databas= eDescriptor.java (line=0A> org.apache.cassandra.config.DatabaseDescriptor)= Loading schema version=0A> 2f555eb0-3332-11e0-9e8d-c4f8bbf76455=0A> I've = been reading the code for migrations and gossip don't have a theory as=0A>= to what is going on.=0A>=0A> REQUEST FOR HELP:=0A> If you have a 0.7 clus= ter can you please check if this has happened so I can=0A> know this is a = real problem or just an Aaron problem. You can check by...=0A> - getting t= he values from the o.a.c.gms.FailureDetector.AllEndPointStates=0A> - runni= ng describe_schema_versions via the API, here is how to do it via=0A> pyca= ssa=A0http://pycassa.github.com/pycassa/api/pycassa/system_manager.html?hi= ghlight=3Ddescribe_schema_versions=0A> - checking at the schema ids' from = the failure detector match the result=0A> from describe_schema_versions()=0A= > - if they do not match can you also include some info on what sort of sc= hema=0A> changes have happened on the box.=0A> Thanks=0A> Aaron=0A>=0A --Apple-Webmail-42--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7 Content-Type: multipart/related; type="text/html"; boundary=Apple-Webmail-86--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7 --Apple-Webmail-86--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=ISO-8859-1;
Thanks Gary. I'll keep an eye on things and see if it happen= s again. 

From reading the code I'm wonderin= g if there is a small chance of a race condition in HintedHandoffManager.w= aitForSchemaAgreement() .

Could the following hap= pen? I'm a little unsure on exactly how the endpoint state is removed from= the map in Gossiper.

1) node 1 starts
= 2) Gossiper calls StorageService.onAlive() when the endpoints are detected= as alive.
3) HintedHandoffManager.deliverHints() adds a runnabl= e to the HintedHandoff TP
4) This happens several times, and nod= e 1 gets busy delivering hints but there is only 1 thread in the thread po= ol.
5) Node n is removed from the cluster and the endpoint state= is deleted in the Gossiper on node 1 
6) Node 1 gets aroun= d to processing the hints for node n and Gossiper.getEndpointStateForEndpo= int() returns null for node n


Thanks
Aaron

On 10 Feb, 2011,at 03:03 AM, Gary Dusbabek <gdusba= bek@gmail.com> wrote:

Aaron,
=0A
=0AIt looks l= ike you're experiencing a side-effect of CASSANDRA-2083.
=0AThere was a= t least one place (when node B received updated schema from
=0Anode A) = where gossip was not being updated with the correct schema
=0Aeven thou= gh DatabaseDescriptor had the right version. I'm pretty sure
=0Athis i= s what you're seeing.
=0A
=0AGary.
=0A
=0A
=0AOn Wed, Feb 9= , 2011 at 00:08, Aaron Morton <aaron@thelastpickle.com> wrote:
=0A> I noticed this after I upgraded one node in a 0.7 = cluster of 5 to the latest
=0A> stable 0.7 build "2011-02-08_20-41-2= 5" (upgraded  node was jb-cass1 below).
=0A> This is a long ema= il, you can jump to the end and help me out by checking
=0A> somethi= ng on your  07 cluster.
=0A> This is the value from o.a.c.gms.F= ailureDetector.AllEndpointStates on
=0A> jb-cass05 9114.67)
=0A&g= t; /192.168.114.63   X3:2011-02-08_20-41-25
=0A> SCHEMA:2f555eb= 0-3332-11e0-9e8d-c4f8bbf76455   LOAD:2.84182972E8
=0A> STATUS:N= ORMAL,0
=0A> /192.168.114.64   SCHEMA:2f555eb0-3332-11e0-9e8d-c= 4f8bbf76455
=0A> LOAD:2.84354156E8   STATUS:NORMAL,340282366920= 93846346337460743176821145
=0A> /192.168.114.66   SCHEMA:075cbd= 1f-3316-11e0-9e8d-c4f8bbf76455
=0A> LOAD:2.59171601E8   STATUS:= NORMAL,102084710076281539039012382229530463435
=0A> /192.168.114.65 =   SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455
=0A> LOAD:2.70907= 168E8   STATUS:NORMAL,68056473384187692692674921486353642290
=0A&g= t; jb08.wetafx.co.nz/192.168.114.67
=0A> SCHEMA:075cbd1f-3316-11e0-9= e8d-c4f8bbf76455   LOAD:1.155260665E9
=0A> STATUS:NORMAL,136112= 946768375385385349842972707284580
=0A> Notice the schema for nodes 6= 3 and 64 starts with 2f55 and for 65, 66 and 67
=0A> it starts with = 075.
=0A> This is the output from pycassa calling describe_versions = when connected to
=0A> both the 63 (jb-cass1) and 67 (jb-cass5) node= s
=0A> In [34]: sys.describe_schema_versions()
=0A> Out[34]:=0A> {'2f555eb0-3332-11e0-9e8d-c4f8bbf76455': ['192.168.114.63',
=0A= >                    =                     &nbs= p;  '192.168.114.64',
=0A>          &n= bsp;                    =            '192.168.114.65',
=0A> &nb= sp;                    &= nbsp;                    = ;'192.168.114.66',
=0A>            &nb= sp;                     =          '192.168.114.67']}
=0A> It's repor= ting all nodes on the 2f55 schema. The SchemaCheckVerbHandler is
=0A>= ; getting the value from DatabaseDescriptor. FailureDetector MBean is gett= ing
=0A> them from Gossiper.endpointStateMap . Requests are wor= king though, so the
=0A> CFid's must be matching up.
=0A> Comm= it 
https://github.com/a= pache/cassandra/commit/ecbd71f6b4bb004d26e585ca8a7e642436a5c1a4 a= dded
=0A> code to the 0.7 branch in the HintedHandOffManager to chec= k the schema
=0A> versions of nodes it has hints for. This is now fa= iling on the new node as
=0A> follows...
=0A> ERROR [HintedHan= doff:1] 2011-02-09 16:11:23,559 AbstractCassandraDaemon.java
=0A> (l= ine
=0A> org.apache.cassandra.service.AbstractCassandraDaemon$1.unca= ughtException(AbstractCassandraDaemon.java:114))
=0A> Fatal exceptio= n in thread Thread[HintedHandoff:1,1,main]
=0A> java.lang.RuntimeExc= eption: java.lang.RuntimeException: Could not reach
=0A> schema agre= ement with /192.168.114.64 in 60000ms
=0A>       = ;  at
=0A> org.apache.cassandra.utils.WrappedRunnable.run(Wrapp= edRunnable.java:34)
=0A>         at
=0A&= gt; java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu= tor.java:886)
=0A>         at
=0A> ja= va.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9= 08)
=0A>         at java.lang.Thread.run(Th= read.java:619)
=0A> Caused by: java.lang.RuntimeException: Could not= reach schema agreement with
=0A> /192.168114.64 in 60000ms
=0A&g= t;         at
=0A> org.apache.cassandra.db.= HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:256)=
=0A>         at
=0A> org.apache.cass= andra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.= java:267)
=0A>         at
=0A> org.ap= ache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.jav= a:88)
=0A>         at
=0A> org.apache= cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java= :391)
=0A>         at
=0A> org.apache= cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
=0A> &= nbsp;       ... 3 more
=0A> (the nodes can all s= ee each other, checked with notetool during the 60
=0A> seconds)
= =0A> If I restart one of the nodes with the 075 schema (without upgradi= ng it) it
=0A> reads the schema from the system tables and goes back= to the 2f55 schema.
=0A> e.g. the 64 node was also on the 075 schem= a, I restarted and it moved to the
=0A> 2f55 and logged appropriatel= y. While writing this email I checked again with
=0A> the 65 node, a= nd the schema if was reporting to other nodes changed after a
=0A> r= estart from 075 to 2f55
=0A> INFO [main] 2011-02-09 17:17:11,457 Dat= abaseDescriptor.java (line
=0A> org.apache.cassandra.config.Database= Descriptor) Loading schema version
=0A> 2f555eb0-3332-11e0-9e8d-c4f8= bbf76455
=0A> I've been reading the code for migrations and gossip d= on't have a theory as
=0A> to what is going on.
=0A>
=0A>= ; REQUEST FOR HELP:
=0A> If you have a 0.7 cluster can you please ch= eck if this has happened so I can
=0A> know this is a real problem o= r just an Aaron problem. You can check by...
=0A> - getting the valu= es from the o.a.c.gms.FailureDetector.AllEndPointStates
=0A> - runni= ng describe_schema_versions via the API, here is how to do it via
=0A&g= t; pycassa http://pycassa.github.com/pycassa/api/pycassa/sy= stem_manager.html?highlight=3Ddescribe_schema_versions
=0A> - ch= ecking at the schema ids' from the failure detector match the result
=0A= > from describe_schema_versions()
=0A> - if they do not match can= you also include some info on what sort of schema
=0A> changes have= happened on the box.
=0A> Thanks
=0A> Aaron
=0A>
=0A=
--Apple-Webmail-86--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7-- --Apple-Webmail-42--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7--