Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=to:from
	:subject:date:message-id:content-type:mime-version:in-reply-to;
	 q=dns; s=thelastpickle.com; b=XK2OZWcxMIuRxu5MI5lbh6zsPryqiOogJ
	yWKD4mLV+3GPTxup+DVlY1nLSwOj2OWQGj40KeQTc90dPeau4LNcOU7WjnvNk+4W
	882E1Kp+/Ax6QWbGO36/uSl96viYEO/Z67PNH5U/6+iJMbQaQwBNgNSB3ZwX1l7n
	WhoroCRPQk=
To: user@cassandra.apache.org
From: Aaron Morton <aaron@thelastpickle.com>
Subject: Re: ApplicationState Schema has drifted from DatabaseDescriptor
Date: Wed, 09 Feb 2011 22:31:54 GMT
Message-id: <39cdf19d-3c1d-4a27-cb04-b7acfc494364@me.com>
Content-Type: multipart/alternative;
 boundary=Apple-Webmail-42--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7
MIME-Version: 1.0
In-Reply-To: <AANLkTimyT6K6fr2h7nPx7OosK2_N7-Fwgkq9t_AmvDHZ@mail.gmail.com>


--Apple-Webmail-42--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
    charset=ISO-8859-1;
    format=flowed

Thanks Gary.=A0I'll keep an eye on things and see if it happens again.=A0=0A=
=0AFrom reading the code I'm wondering if there is a small chance of a rac=
e condition in HintedHandoffManager.waitForSchemaAgreement() .=0A=0ACould =
the following happen? I'm a little unsure on exactly how the endpoint stat=
e is removed from the map in Gossiper.=0A=0A1) node 1 starts=0A2) Gossiper=
 calls StorageService.onAlive() when the endpoints are detected as alive.=0A=
3) HintedHandoffManager.deliverHints() adds a runnable to the HintedHandof=
f TP=0A4) This happens several times, and node 1 gets busy delivering hint=
s but there is only 1 thread in the thread pool.=0A5) Node n is removed fr=
om the cluster and the endpoint state is deleted in the Gossiper on node 1=
=A0=0A6) Node 1 gets around to processing the hints for node n and Gossipe=
r.getEndpointStateForEndpoint() returns null for node n=0A=0A=0AThanks=0AA=
aron=0A=0AOn 10 Feb, 2011,at 03:03 AM, Gary Dusbabek <gdusbabek@gmail.com>=
 wrote:=0A=0AAaron,=0A=0AIt looks like you're experiencing a side-effect o=
f CASSANDRA-2083.=0AThere was at least one place (when node B received upd=
ated schema from=0Anode A) where gossip was not being updated with the cor=
rect schema=0Aeven though DatabaseDescriptor had the right version. I'm pr=
etty sure=0Athis is what you're seeing.=0A=0AGary.=0A=0A=0AOn Wed, Feb 9, =
2011 at 00:08, Aaron Morton <aaron@thelastpickle.com> wrote:=0A> I noticed=
 this after I upgraded one node in a 0.7 cluster of 5 to the latest=0A> st=
able 0.7 build "2011-02-08_20-41-25" (upgraded =A0node was jb-cass1 below)=
=0A> This is a long email, you can jump to the end and help me out by che=
cking=0A> something on your =A007 cluster.=0A> This is the value from o.a.=
c.gms.FailureDetector.AllEndpointStates on=0A> jb-cass05 9114.67)=0A> /192=
168.114.63 =A0 X3:2011-02-08_20-41-25=0A> SCHEMA:2f555eb0-3332-11e0-9e8d-=
c4f8bbf76455 =A0 LOAD:2.84182972E8=0A> STATUS:NORMAL,0=0A> /192.168.114.64=
 =A0 SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455=0A> LOAD:2.84354156E8 =A0=
 STATUS:NORMAL,34028236692093846346337460743176821145=0A> /192.168.114.66 =
=A0 SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455=0A> LOAD:2.59171601E8 =A0 =
STATUS:NORMAL,102084710076281539039012382229530463435=0A> /192.168.114.65 =
=A0 SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455=0A> LOAD:2.70907168E8 =A0 =
STATUS:NORMAL,68056473384187692692674921486353642290=0A> jb08.wetafx.co.nz=
/192.168.114.67=0A> SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455 =A0 LOAD:1=
155260665E9=0A> STATUS:NORMAL,136112946768375385385349842972707284580=0A>=
 Notice the schema for nodes 63 and 64 starts with 2f55 and for 65, 66 and=
 67=0A> it starts with 075.=0A> This is the output from pycassa calling de=
scribe_versions when connected to=0A> both the 63 (jb-cass1) and 67 (jb-ca=
ss5) nodes=0A> In [34]: sys.describe_schema_versions()=0A> Out[34]:=0A> {'=
2f555eb0-3332-11e0-9e8d-c4f8bbf76455': ['192.168.114.63',=0A> =A0=A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0'1=
92.168.114.64',=0A> =A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0'192.168.114.65',=0A> =A0=A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0'192.168.1=
14.66',=0A> =A0=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 =A0'192.168.114.67']}=0A> It's reporting all nodes on=
 the 2f55 schema. The SchemaCheckVerbHandler is=0A> getting the value from=
 DatabaseDescriptor. FailureDetector MBean is getting=0A> them from Gossip=
er.endpointStateMap .=A0Requests are working though, so the=0A> CFid's mus=
t be matching up.=0A> Commit=A0https://github.com/apache/cassandra/commit/=
ecbd71f6b4bb004d26e585ca8a7e642436a5c1a4=A0added=0A> code to the 0.7 branc=
h in the HintedHandOffManager to check the schema=0A> versions of nodes it=
 has hints for. This is now failing on the new node as=0A> follows...=0A> =
ERROR [HintedHandoff:1] 2011-02-09 16:11:23,559 AbstractCassandraDaemon.ja=
va=0A> (line=0A> org.apache.cassandra.service.AbstractCassandraDaemon$1.un=
caughtException(AbstractCassandraDaemon.java:114))=0A> Fatal exception in =
thread Thread[HintedHandoff:1,1,main]=0A> java.lang.RuntimeException: java=
lang.RuntimeException: Could not reach=0A> schema agreement with /192.168=
114.64 in 60000ms=0A> =A0=A0 =A0 =A0 =A0at=0A> org.apache.cassandra.utils=
WrappedRunnable.run(WrappedRunnable.java:34)=0A> =A0=A0 =A0 =A0 =A0at=0A>=
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor=
java:886)=0A> =A0=A0 =A0 =A0 =A0at=0A> java.util.concurrent.ThreadPoolExe=
cutor$Worker.run(ThreadPoolExecutor.java:908)=0A> =A0=A0 =A0 =A0 =A0at jav=
a.lang.Thread.run(Thread.java:619)=0A> Caused by: java.lang.RuntimeExcepti=
on: Could not reach schema agreement with=0A> /192.168114.64 in 60000ms=0A=
> =A0=A0 =A0 =A0 =A0at=0A> org.apache.cassandra.db.HintedHandOffManager.wa=
itForSchemaAgreement(HintedHandOffManager.java:256)=0A> =A0=A0 =A0 =A0 =A0=
at=0A> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint=
(HintedHandOffManager.java:267)=0A> =A0=A0 =A0 =A0 =A0at=0A> org.apache.ca=
ssandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)=0A=
> =A0=A0 =A0 =A0 =A0at=0A> org.apache.cassandra.db.HintedHandOffManager$2.=
runMayThrow(HintedHandOffManager.java:391)=0A> =A0=A0 =A0 =A0 =A0at=0A> or=
g.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)=0A> =
=A0=A0 =A0 =A0 =A0... 3 more=0A> (the nodes can all see each other, checke=
d with notetool during the 60=0A> seconds)=0A> If I restart one of the nod=
es with the 075 schema (without upgrading it) it=0A> reads the schema from=
 the system tables and goes back to the 2f55 schema.=0A> e.g. the 64 node =
was also on the 075 schema, I restarted and it moved to the=0A> 2f55 and l=
ogged appropriately. While writing this email I checked again with=0A> the=
 65 node, and the schema if was reporting to other nodes changed after a=0A=
> restart from 075 to 2f55=0A> INFO [main] 2011-02-09 17:17:11,457 Databas=
eDescriptor.java (line=0A> org.apache.cassandra.config.DatabaseDescriptor)=
 Loading schema version=0A> 2f555eb0-3332-11e0-9e8d-c4f8bbf76455=0A> I've =
been reading the code for migrations and gossip don't have a theory as=0A>=
 to what is going on.=0A>=0A> REQUEST FOR HELP:=0A> If you have a 0.7 clus=
ter can you please check if this has happened so I can=0A> know this is a =
real problem or just an Aaron problem. You can check by...=0A> - getting t=
he values from the o.a.c.gms.FailureDetector.AllEndPointStates=0A> - runni=
ng describe_schema_versions via the API, here is how to do it via=0A> pyca=
ssa=A0http://pycassa.github.com/pycassa/api/pycassa/system_manager.html?hi=
ghlight=3Ddescribe_schema_versions=0A> - checking at the schema ids' from =
the failure detector match the result=0A> from describe_schema_versions()=0A=
> - if they do not match can you also include some info on what sort of sc=
hema=0A> changes have happened on the box.=0A> Thanks=0A> Aaron=0A>=0A
--Apple-Webmail-42--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7
Content-Type: multipart/related;
    type="text/html";
    boundary=Apple-Webmail-86--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7


--Apple-Webmail-86--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
    charset=ISO-8859-1;

<html><body><div><meta http-equiv=3D"content-type" content=3D"text/html; charset=3Dutf=
-8"><div>Thanks Gary.&nbsp;I'll keep an eye on things and see if it happen=
s again.&nbsp;</div><div><br></div><div>From reading the code I'm wonderin=
g if there is a small chance of a race condition in HintedHandoffManager.w=
aitForSchemaAgreement() .</div><div><br></div><div>Could the following hap=
pen? I'm a little unsure on exactly how the endpoint state is removed from=
 the map in Gossiper.</div><div><br></div><div>1) node 1 starts</div><div>=
2) Gossiper calls StorageService.onAlive() when the endpoints are detected=
 as alive.</div><div>3) HintedHandoffManager.deliverHints() adds a runnabl=
e to the HintedHandoff TP</div><div>4) This happens several times, and nod=
e 1 gets busy delivering hints but there is only 1 thread in the thread po=
ol.</div><div>5) Node n is removed from the cluster and the endpoint state=
 is deleted in the Gossiper on node 1&nbsp;</div><div>6) Node 1 gets aroun=
d to processing the hints for node n and Gossiper.getEndpointStateForEndpo=
int() returns null for node n<br><br></div><div><br></div><div>Thanks</div=
><div>Aaron</div><br>On 10 Feb, 2011,at 03:03 AM, Gary Dusbabek &lt;gdusba=
bek@gmail.com&gt; wrote:<br><br></div><div><blockquote type=3D"cite"><div =
class=3D"msg-quote"><div class=3D"_stretch">Aaron,<br>=0A<br>=0AIt looks l=
ike you're experiencing a side-effect of CASSANDRA-2083.<br>=0AThere was a=
t least one place (when node B received updated schema from<br>=0Anode A) =
where gossip was not being updated with the correct schema<br>=0Aeven thou=
gh DatabaseDescriptor had the right version.  I'm pretty sure<br>=0Athis i=
s what you're seeing.<br>=0A<br>=0AGary.<br>=0A<br>=0A<br>=0AOn Wed, Feb 9=
, 2011 at 00:08, Aaron Morton &lt;<a href=3D"mailto:aaron@thelastpickle.co=
m" _mce_href=3D"mailto:aaron@thelastpickle.com">aaron@thelastpickle.com</a=
>&gt; wrote:<br>=0A&gt; I noticed this after I upgraded one node in a 0.7 =
cluster of 5 to the latest<br>=0A&gt; stable 0.7 build "2011-02-08_20-41-2=
5" (upgraded &nbsp;node was jb-cass1 below).<br>=0A&gt; This is a long ema=
il, you can jump to the end and help me out by checking<br>=0A&gt; somethi=
ng on your &nbsp;07 cluster.<br>=0A&gt; This is the value from o.a.c.gms.F=
ailureDetector.AllEndpointStates on<br>=0A&gt; jb-cass05 9114.67)<br>=0A&g=
t; /192.168.114.63 &nbsp; X3:2011-02-08_20-41-25<br>=0A&gt; SCHEMA:2f555eb=
0-3332-11e0-9e8d-c4f8bbf76455 &nbsp; LOAD:2.84182972E8<br>=0A&gt; STATUS:N=
ORMAL,0<br>=0A&gt; /192.168.114.64 &nbsp; SCHEMA:2f555eb0-3332-11e0-9e8d-c=
4f8bbf76455<br>=0A&gt; LOAD:2.84354156E8 &nbsp; STATUS:NORMAL,340282366920=
93846346337460743176821145<br>=0A&gt; /192.168.114.66 &nbsp; SCHEMA:075cbd=
1f-3316-11e0-9e8d-c4f8bbf76455<br>=0A&gt; LOAD:2.59171601E8 &nbsp; STATUS:=
NORMAL,102084710076281539039012382229530463435<br>=0A&gt; /192.168.114.65 =
&nbsp; SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455<br>=0A&gt; LOAD:2.70907=
168E8 &nbsp; STATUS:NORMAL,68056473384187692692674921486353642290<br>=0A&g=
t; jb08.wetafx.co.nz/192.168.114.67<br>=0A&gt; SCHEMA:075cbd1f-3316-11e0-9=
e8d-c4f8bbf76455 &nbsp; LOAD:1.155260665E9<br>=0A&gt; STATUS:NORMAL,136112=
946768375385385349842972707284580<br>=0A&gt; Notice the schema for nodes 6=
3 and 64 starts with 2f55 and for 65, 66 and 67<br>=0A&gt; it starts with =
075.<br>=0A&gt; This is the output from pycassa calling describe_versions =
when connected to<br>=0A&gt; both the 63 (jb-cass1) and 67 (jb-cass5) node=
s<br>=0A&gt; In [34]: sys.describe_schema_versions()<br>=0A&gt; Out[34]:<b=
r>=0A&gt; {'2f555eb0-3332-11e0-9e8d-c4f8bbf76455': ['192.168.114.63',<br>=0A=
&gt; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbs=
p; &nbsp;'192.168.114.64',<br>=0A&gt; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &n=
bsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;=
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'192.168.114.65',<br>=0A&gt; &nb=
sp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &=
nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp=
;'192.168.114.66',<br>=0A&gt; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nb=
sp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'192.168.114.67']}<br>=0A&gt; It's repor=
ting all nodes on the 2f55 schema. The SchemaCheckVerbHandler is<br>=0A&gt=
; getting the value from DatabaseDescriptor. FailureDetector MBean is gett=
ing<br>=0A&gt; them from Gossiper.endpointStateMap .&nbsp;Requests are wor=
king though, so the<br>=0A&gt; CFid's must be matching up.<br>=0A&gt; Comm=
it&nbsp;<a href=3D"https://github.com/apache/cassandra/commit/ecbd71f6b4bb=
004d26e585ca8a7e642436a5c1a4" _mce_href=3D"https://github.com/apache/cassa=
ndra/commit/ecbd71f6b4bb004d26e585ca8a7e642436a5c1a4">https://github.com/a=
pache/cassandra/commit/ecbd71f6b4bb004d26e585ca8a7e642436a5c1a4</a>&nbsp;a=
dded<br>=0A&gt; code to the 0.7 branch in the HintedHandOffManager to chec=
k the schema<br>=0A&gt; versions of nodes it has hints for. This is now fa=
iling on the new node as<br>=0A&gt; follows...<br>=0A&gt; ERROR [HintedHan=
doff:1] 2011-02-09 16:11:23,559 AbstractCassandraDaemon.java<br>=0A&gt; (l=
ine<br>=0A&gt; org.apache.cassandra.service.AbstractCassandraDaemon$1.unca=
ughtException(AbstractCassandraDaemon.java:114))<br>=0A&gt; Fatal exceptio=
n in thread Thread[HintedHandoff:1,1,main]<br>=0A&gt; java.lang.RuntimeExc=
eption: java.lang.RuntimeException: Could not reach<br>=0A&gt; schema agre=
ement with /192.168.114.64 in 60000ms<br>=0A&gt; &nbsp;&nbsp; &nbsp; &nbsp=
; &nbsp;at<br>=0A&gt; org.apache.cassandra.utils.WrappedRunnable.run(Wrapp=
edRunnable.java:34)<br>=0A&gt; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;at<br>=0A&=
gt; java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu=
tor.java:886)<br>=0A&gt; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;at<br>=0A&gt; ja=
va.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9=
08)<br>=0A&gt; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;at java.lang.Thread.run(Th=
read.java:619)<br>=0A&gt; Caused by: java.lang.RuntimeException: Could not=
 reach schema agreement with<br>=0A&gt; /192.168114.64 in 60000ms<br>=0A&g=
t; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;at<br>=0A&gt; org.apache.cassandra.db.=
HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:256)=
<br>=0A&gt; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;at<br>=0A&gt; org.apache.cass=
andra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.=
java:267)<br>=0A&gt; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;at<br>=0A&gt; org.ap=
ache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.jav=
a:88)<br>=0A&gt; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;at<br>=0A&gt; org.apache=
cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java=
:391)<br>=0A&gt; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;at<br>=0A&gt; org.apache=
cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)<br>=0A&gt; &=
nbsp;&nbsp; &nbsp; &nbsp; &nbsp;... 3 more<br>=0A&gt; (the nodes can all s=
ee each other, checked with notetool during the 60<br>=0A&gt; seconds)<br>=
=0A&gt; If I restart one of the nodes with the 075 schema (without upgradi=
ng it) it<br>=0A&gt; reads the schema from the system tables and goes back=
 to the 2f55 schema.<br>=0A&gt; e.g. the 64 node was also on the 075 schem=
a, I restarted and it moved to the<br>=0A&gt; 2f55 and logged appropriatel=
y. While writing this email I checked again with<br>=0A&gt; the 65 node, a=
nd the schema if was reporting to other nodes changed after a<br>=0A&gt; r=
estart from 075 to 2f55<br>=0A&gt; INFO [main] 2011-02-09 17:17:11,457 Dat=
abaseDescriptor.java (line<br>=0A&gt; org.apache.cassandra.config.Database=
Descriptor) Loading schema version<br>=0A&gt; 2f555eb0-3332-11e0-9e8d-c4f8=
bbf76455<br>=0A&gt; I've been reading the code for migrations and gossip d=
on't have a theory as<br>=0A&gt; to what is going on.<br>=0A&gt;<br>=0A&gt=
; REQUEST FOR HELP:<br>=0A&gt; If you have a 0.7 cluster can you please ch=
eck if this has happened so I can<br>=0A&gt; know this is a real problem o=
r just an Aaron problem. You can check by...<br>=0A&gt; - getting the valu=
es from the o.a.c.gms.FailureDetector.AllEndPointStates<br>=0A&gt; - runni=
ng describe_schema_versions via the API, here is how to do it via<br>=0A&g=
t; pycassa&nbsp;<a href=3D"http://pycassa.github.com/pycassa/api/pycassa/s=
ystem_manager.html?highlight=3Ddescribe_schema_versions" _mce_href=3D"http=
://pycassa.github.com/pycassa/api/pycassa/system_manager.html?highlight=3D=
describe_schema_versions">http://pycassa.github.com/pycassa/api/pycassa/sy=
stem_manager.html?highlight=3Ddescribe_schema_versions</a><br>=0A&gt; - ch=
ecking at the schema ids' from the failure detector match the result<br>=0A=
&gt; from describe_schema_versions()<br>=0A&gt; - if they do not match can=
 you also include some info on what sort of schema<br>=0A&gt; changes have=
 happened on the box.<br>=0A&gt; Thanks<br>=0A&gt; Aaron<br>=0A&gt;<br>=0A=
</div></div></blockquote></div></body></html>
--Apple-Webmail-86--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7--

--Apple-Webmail-42--d5caf0c4-c2c7-cf35-b41e-ebbe3203b3d7--