incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <>
Subject ApplicationState Schema has drifted from DatabaseDescriptor
Date Wed, 09 Feb 2011 06:08:02 GMT
I noticed this after I upgraded one node in a 0.7 cluster of 5 to the latest stable 0.7 build
"2011-02-08_20-41-25" (upgraded  node was jb-cass1 below). This is a long email, you can
jump to the end and help me out by checking something on your  0.7 cluster. 

This is the value from o.a.c.gms.FailureDetector.AllEndpointStates on jb-cass05 9114.67)

/   X3:2011-02-08_20-41-25   SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455  
LOAD:2.84182972E8   STATUS:NORMAL,0
/   SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455   LOAD:2.84354156E8   STATUS:NORMAL,34028236692093846346337460743176821145 
/   SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455   LOAD:2.59171601E8   STATUS:NORMAL,102084710076281539039012382229530463435 
/   SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455   LOAD:2.70907168E8   STATUS:NORMAL,68056473384187692692674921486353642290   SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455   LOAD:1.155260665E9

Notice the schema for nodes 63 and 64 starts with 2f55 and for 65, 66 and 67 it starts with

This is the output from pycassa calling describe_versions when connected to both the 63 (jb-cass1)
and 67 (jb-cass5) nodes

In [34]: sys.describe_schema_versions()
{'2f555eb0-3332-11e0-9e8d-c4f8bbf76455': ['',

It's reporting all nodes on the 2f55 schema. The SchemaCheckVerbHandler is getting the value
from DatabaseDescriptor. FailureDetector MBean is getting them from Gossiper.endpointStateMap
. Requests are working though, so the CFid's must be matching up. 

Commit added
code to the 0.7 branch in the HintedHandOffManager to check the schema versions of nodes it
has hints for. This is now failing on the new node as follows...

ERROR [HintedHandoff:1] 2011-02-09 16:11:23,559 (line org.apache.cassandra.service.AbstractCassandraDaemon$1.uncaughtException(
Fatal exception in thread Thread[HintedHandoff:1,1,main]
java.lang.RuntimeException: java.lang.RuntimeException: Could not reach schema agreement with
/ in 60000ms
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
        at java.util.concurrent.ThreadPoolExecutor$
Caused by: java.lang.RuntimeException: Could not reach schema agreement with /
in 60000ms
        at org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(
        at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(
        at org.apache.cassandra.db.HintedHandOffManager.access$100(
        at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(
        ... 3 more

(the nodes can all see each other, checked with notetool during the 60 seconds)

If I restart one of the nodes with the 075 schema (without upgrading it) it reads the schema
from the system tables and goes back to the 2f55 schema. e.g. the 64 node was also on the
075 schema, I restarted and it moved to the 2f55 and logged appropriately. While writing this
email I checked again with the 65 node, and the schema if was reporting to other nodes changed
after a restart from 075 to 2f55

INFO [main] 2011-02-09 17:17:11,457 (line org.apache.cassandra.config.DatabaseDescriptor)
Loading schema version 2f555eb0-3332-11e0-9e8d-c4f8bbf76455

I've been reading the code for migrations and gossip don't have a theory as to what is going


If you have a 0.7 cluster can you please check if this has happened so I can know this is
a real problem or just an Aaron problem. You can check by...
- getting the values from the o.a.c.gms.FailureDetector.AllEndPointStates
- running describe_schema_versions via the API, here is how to do it via pycassa
- checking at the schema ids' from the failure detector match the result from describe_schema_versions()
- if they do not match can you also include some info on what sort of schema changes have
happened on the box.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message