After a chat with driftx today, I tried wiping out my MigrationInfo on the ring and rolling a restart. I then made a single change to the schema so at least 1 migration would exist. Unfortunately the same error persists: "Previous version mismatch". Also occasionally the node is bootstrapping without applying any schema on startup. The behaviour is inconsistent, despite wiping the entire data directory and commitlogs on the bootstrapping node.

I added some debug statements to Migration.java to find exactly what the mismatch was. Here is what I have now:

DEBUG [MigrationStage:1] 2012-09-07 04:44:53,669 Migration.java (line 98) lastversion: ee323110-eedf-11e1-0000-5027269873df  getVersion: 00000000-0000-1000-0000-000000000000
DEBUG [MigrationStage:1] 2012-09-07 04:44:53,669 DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous version mismatch. cannot apply.
DEBUG [MigrationStage:1] 2012-09-07 04:44:53,670 DefinitionsUpdateVerbHandler.java (line 70) Applying UpdateColumnFamily from /10.140.129.18
DEBUG [MigrationStage:1] 2012-09-07 04:44:53,670 Migration.java (line 98) lastversion: ee323110-eedf-11e1-0000-5027269873df  getVersion: 00000000-0000-1000-0000-000000000000
DEBUG [MigrationStage:1] 2012-09-07 04:44:53,670 DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous version mismatch. cannot apply.

The "Previous version mismatch" event happens when lastVersion != getVersion. Obviously that is the case here, as getVersion is blank. Don't all nodes bootstrap with a blank schema version? Why would the Migration logic expect the lastVersion to match the bootstrapping nodes getVersion?



On Wednesday, September 5, 2012 4:29:34 AM UTC-7, Jason Harvey wrote:
Hey folks,

I have a 1.0.11 ring running in production with 6 nodes. Trying to bootstrap a new node in, and I'm getting the following consistently:

 INFO [main] 2012-09-05 04:24:13,317 StorageService.java (line 668) JOINING: waiting for schema information to complete


After waiting for over 30 minutes, I restarted the node to try again, and got the same thing. Tried wiping out the data dir on the new node, as well. Same result.

Turned on DEBUG, and got the following:

 INFO [main] 2012-09-05 03:58:55,205 StorageService.java (line 668) JOINING: waiting for schema information to complete
DEBUG [MigrationStage:1] 2012-09-05 03:59:11,440 DefinitionsUpdateVerbHandler.java (line 70) Applying UpdateColumnFamily from /10.140.128.218
DEBUG [MigrationStage:1] 2012-09-05 03:59:11,440 DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous version mismatch. cannot apply.
DEBUG [MigrationStage:1] 2012-09-05 03:59:11,631 DefinitionsUpdateVerbHandler.java (line 70) Applying UpdateColumnFamily from /10.140.128.218
DEBUG [MigrationStage:1] 2012-09-05 03:59:11,631 DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous version mismatch. cannot apply.


The logs continue with a bunch of failed migration errors from each node in the ring.

So, I'm guessing that there is a schema history problem on one of my nodes? Any clues on how I can fix this? I had considered wiping out the schema on one of my running nodes and starting it back up, but I'm worried it might not come back if it gets the same errors.


Also as a random question: is there any way to 'merge' historical schema changes together?


Thanks,
Jason