incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From graham sanderson <>
Subject Strange slow schema agreement on 2.0.9 ... anyone seen this?
Date Fri, 08 Aug 2014 20:45:14 GMT
We recently upgraded C* from 2.0.5 to 2.0.9

We have some data that is partitioned in tables created periodically (once a day). This morning,
this automated process timed out because the schema did not reach agreement quickly enough
after we created a new empty table.

I was able to reproduce this manually via CQLSH. when I created the table, and ran a nodetool
describecluster, it showed 3 nodes on the old schema and 3 nodes on the new schema instantly
(or as quick as I could run the nodetool describecluster). It took almost exactly a minute
for the other nodes to switch.

The nodes weren’t busy, machines were healthy network was healthy, JVMs were healthy - nodetool
status, gossipinfo and OpsCenter all looked happy. We never saw this issue in beta on 2.0.9
or anywhere on 2.0.5, and yesterday on 2.0.9 after the upgrade it worked correctly.

The only clue I have is that for this case, the nodes which were slow to update called DefsTables.mergeSchema
from InternalResponseStage not MigrationStage (which is what it is called on as I test it
Looking at the logs, these InternalResponseStage happened eerily close (within a second) to
exactly a minute.

Having discovered nothing else wrong, I restarted one of the “slow” nodes, and the problem
went away (for that node). So now the cluster has been rolling restarted, and is proceeding

Anyways, I will dig a little deeper as to why (when all nodes thing each other are up) the
migration verb might not get executed (there were no errors in any logs)… mostly wondering
if this rings a bell with anyone
View raw message