incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Heneveld <alex.henev...@cloudsoftcorp.com>
Subject SchemaDisagreementError when launching a new Cassandra (1.2.2) cluster ?
Date Mon, 09 Sep 2013 15:07:54 GMT

Hi folks,

I'm occasionally seeing SchemaDisagreementError on the boot of a *new* 
cluster.  I'm hoping someone can explain what I'm doing wrong, or help 
me track down the bug if it is one.

The problem occurs in about 1 in 4 launches when I start a 2-node 
cluster, where the two machines are configured identically with both 
nodes as the seeds (apart from the listen_address being different). On 
the problematic launches, describing schema versions immediately after 
start shows that the two nodes have different schemas (reported at both 
nodes) and any attempt to work with the nodes returns the SDE.  This is 
before I attempt to do anything to the cluster.  After ~60s the nodes 
reconcile their differences, report a single schema used at both nodes, 
and I can use the cluster without problems.

Key points:

* The problem usually fixes itself 60s after startup (almost exactly, I 
poll every second)

* The problem is intermittent occurring on between 10% and 50% of 
launches (failure rates seem higher at peak cloud times -- so possibly 
linked to background CPU/network/storage contention)

* For the problem period (the initial 60s), peer size is reported as 2, 
and both nodes report the same schema versions map containing two 
schemas each with one of the nodes against them (after 60s the map 
contains one schema with both nodes)

* In some of the problematic launches, it takes ~120s to reconcile, 
where for the first 60s the nodes do not seem to see each other at all 
(each reports peer size 1, and a a single schema used by only one node 
(itself)), then for the next 60s the problem is as described above 
(disagreeing schemas); again the 60s/120s seems meaningfully precise

* The problem occurs whether the two nodes are launched simultaneously 
or are launched with a delay between the two

I have a workaround, which is to use just one node to seed this initial 
set.  When the set of seeds is cardinality 1, the problem does not 
occur.  However the advice is to use 2 seeds and have them be the same 
across the cluster -- so I'd like to get to the bottom of this!

I'd also like to be sure that any subsequent nodes added to the cluster 
aren't going to cause the same problem when we start using it!

I am running Cassandra 1.2.2 running in Amazon, using Brooklyn 
(brooklyn.io) to start and manage it.  I can share test cases, 
cassandra.yaml, logs, etc -- but am starting with the above summary in 
case anyone can point me in the right direction from that.

Thanks,
Alex


Mime
View raw message