incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Theroux <mthero...@yahoo.com>
Subject Serious issue updating Cassandra version and topology
Date Sun, 08 Jul 2012 04:05:32 GMT
Hello,

We're in the process of trying to move a 6-node cluster from RF=1 to RF=3. Once our replication
factor was upped to 3, we ran nodetool repair, and immediately hit an issue on the first node
we ran repair on:

 INFO 03:08:51,536 Starting repair command #1, repairing 2 ranges.
 INFO 03:08:51,552 [repair #3e724fe0-c8aa-11e1-0000-4f728ab9d6ff] new session: will sync xxx-xx-xx-xxx-132.compute-1.amazonaws.com/10.202.99.101,
/10.29.187.61 on range (Token(bytes[d5555555555555555555555555555558]),Token(bytes[00000000000000000000000000000000])]
for xxxxx.[aaaaa, bbbbb, ccccc, ddddd, eeeee, fffff, ggggg, hhhhh, iiiii, jjjjj, kkkkk, lllll,
mmmmm, nnnnn, ooooo, ppppp, qqqqq, rrrrr, sssss]
 INFO 03:08:51,555 [repair #3e724fe0-c8aa-11e1-0000-4f728ab9d6ff] requesting merkle trees
for aaaaa (to [/10.29.187.61, xxx-xx-xx-xxx-compute-1.amazonaws.com/10.202.99.101])
 INFO 03:08:52,719 [repair #3e724fe0-c8aa-11e1-0000-4f728ab9d6ff] Received merkle tree for
aaaaa from /10.29.187.61
 INFO 03:08:53,518 [repair #3e724fe0-c8aa-11e1-0000-4f728ab9d6ff] Received merkle tree for
aaaaa from xxx-xx-xx-xxx-.compute-1.amazonaws.com/10.202.99.101
 INFO 03:08:53,519 [repair #3e724fe0-c8aa-11e1-0000-4f728ab9d6ff] requesting merkle trees
for bbbbb (to [/10.29.187.61, xxx-xx-xx-xxx-132.compute-1.amazonaws.com/10.202.99.101])
 INFO 03:08:53,639 [repair #3e724fe0-c8aa-11e1-0000-4f728ab9d6ff] Endpoints /10.29.187.61
and xxx-xx-xx-xxx-132.compute-1.amazonaws.com/10.202.99.101 are consistent for aaaaa
 INFO 03:08:53,640 [repair #3e724fe0-c8aa-11e1-0000-4f728ab9d6ff] aaaaa is fully synced (18
remaining column family to sync for this session)
 INFO 03:08:54,049 [repair #3e724fe0-c8aa-11e1-0000-4f728ab9d6ff] Received merkle tree for
bbbbb from /10.29.187.61
ERROR 03:09:09,440 Exception in thread Thread[ValidationExecutor:1,1,main]
java.lang.AssertionError: row DecoratedKey(Token(bytes[efd5654ce92a705b14244e2f5f73ab98c3de2f66c7adbd71e0e893997e198c47]),
efd5654ce92a705b14244e2f5f73ab98c3de2f66c7adbd71e0e893997e198c47) received out of order wrt
DecoratedKey(Token(bytes[f33a5ad4a45e8cac7987737db246ddfe9294c95bea40f411485055f5dbecbadb]),
f33a5ad4a45e8cac7987737db246ddfe9294c95bea40f411485055f5dbecbadb)
	at org.apache.cassandra.service.AntiEntropyService$Validator.add(AntiEntropyService.java:349)
	at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:712)
	at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:68)
	at org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:438)
	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)

It looks from the log above, the sync of the "aaaaa" column family was successful.  However,
the "bbbbb" column family resulted in this error.  In addition, the repair hung after this
error.  We ran node tool scrub on all nodes and invalidated the key and row caches and tried
again (with RF=2), and it didn't help alleviate the problem.

Some other important pieces of information:
We use ByteOrderedPartitioner (we MD5 hash the keys ourselves)
We're using Leveled Compaction
As we're in the middle of a transition, one node is on 1.1.2 (the one we tried repair on),
the other 5 are on 1.1.1

Thanks,
-Mike


Mime
View raw message