incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Dusbabek <gdusba...@gmail.com>
Subject Re: ApplicationState Schema has drifted from DatabaseDescriptor
Date Wed, 09 Feb 2011 14:03:03 GMT
Aaron,

It looks like you're experiencing a side-effect of CASSANDRA-2083.
There was at least one place (when node B received updated schema from
node A) where gossip was not being updated with the correct schema
even though DatabaseDescriptor had the right version.  I'm pretty sure
this is what you're seeing.

Gary.


On Wed, Feb 9, 2011 at 00:08, Aaron Morton <aaron@thelastpickle.com> wrote:
> I noticed this after I upgraded one node in a 0.7 cluster of 5 to the latest
> stable 0.7 build "2011-02-08_20-41-25" (upgraded  node was jb-cass1 below).
> This is a long email, you can jump to the end and help me out by checking
> something on your  07 cluster.
> This is the value from o.a.c.gms.FailureDetector.AllEndpointStates on
> jb-cass05 9114.67)
> /192.168.114.63   X3:2011-02-08_20-41-25
> SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455   LOAD:2.84182972E8
> STATUS:NORMAL,0
> /192.168.114.64   SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455
> LOAD:2.84354156E8   STATUS:NORMAL,34028236692093846346337460743176821145
> /192.168.114.66   SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455
> LOAD:2.59171601E8   STATUS:NORMAL,102084710076281539039012382229530463435
> /192.168.114.65   SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455
> LOAD:2.70907168E8   STATUS:NORMAL,68056473384187692692674921486353642290
> jb08.wetafx.co.nz/192.168.114.67
> SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455   LOAD:1.155260665E9
> STATUS:NORMAL,136112946768375385385349842972707284580
> Notice the schema for nodes 63 and 64 starts with 2f55 and for 65, 66 and 67
> it starts with 075.
> This is the output from pycassa calling describe_versions when connected to
> both the 63 (jb-cass1) and 67 (jb-cass5) nodes
> In [34]: sys.describe_schema_versions()
> Out[34]:
> {'2f555eb0-3332-11e0-9e8d-c4f8bbf76455': ['192.168.114.63',
>                                           '192.168.114.64',
>                                           '192.168.114.65',
>                                           '192.168.114.66',
>                                           '192.168.114.67']}
> It's reporting all nodes on the 2f55 schema. The SchemaCheckVerbHandler is
> getting the value from DatabaseDescriptor. FailureDetector MBean is getting
> them from Gossiper.endpointStateMap . Requests are working though, so the
> CFid's must be matching up.
> Commit https://github.com/apache/cassandra/commit/ecbd71f6b4bb004d26e585ca8a7e642436a5c1a4 added
> code to the 0.7 branch in the HintedHandOffManager to check the schema
> versions of nodes it has hints for. This is now failing on the new node as
> follows...
> ERROR [HintedHandoff:1] 2011-02-09 16:11:23,559 AbstractCassandraDaemon.java
> (line
> org.apache.cassandra.service.AbstractCassandraDaemon$1.uncaughtException(AbstractCassandraDaemon.java:114))
> Fatal exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Could not reach
> schema agreement with /192.168.114.64 in 60000ms
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.RuntimeException: Could not reach schema agreement with
> /192.168114.64 in 60000ms
>         at
> org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:256)
>         at
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267)
>         at
> org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
>         at
> org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         ... 3 more
> (the nodes can all see each other, checked with notetool during the 60
> seconds)
> If I restart one of the nodes with the 075 schema (without upgrading it) it
> reads the schema from the system tables and goes back to the 2f55 schema.
> e.g. the 64 node was also on the 075 schema, I restarted and it moved to the
> 2f55 and logged appropriately. While writing this email I checked again with
> the 65 node, and the schema if was reporting to other nodes changed after a
> restart from 075 to 2f55
> INFO [main] 2011-02-09 17:17:11,457 DatabaseDescriptor.java (line
> org.apache.cassandra.config.DatabaseDescriptor) Loading schema version
> 2f555eb0-3332-11e0-9e8d-c4f8bbf76455
> I've been reading the code for migrations and gossip don't have a theory as
> to what is going on.
>
> REQUEST FOR HELP:
> If you have a 0.7 cluster can you please check if this has happened so I can
> know this is a real problem or just an Aaron problem. You can check by...
> - getting the values from the o.a.c.gms.FailureDetector.AllEndPointStates
> - running describe_schema_versions via the API, here is how to do it via
> pycassa http://pycassa.github.com/pycassa/api/pycassa/system_manager.html?highlight=describe_schema_versions
> - checking at the schema ids' from the failure detector match the result
> from describe_schema_versions()
> - if they do not match can you also include some info on what sort of schema
> changes have happened on the box.
> Thanks
> Aaron
>

Mime
View raw message