I'm currently the proud owner of an 8-node cluster that won't start up.

Yesterday we had a developer doing very high volume writes to our cluster via a Hadoop job that was reading an HDFS file and running six concurrent mappers on each of 8 nodes and using Hector to do the load and it sort of killed Cassandra.  It was running 0.7.0 and actually killed three of the nodes with OutOfMemory errors before he realized something was awry and killed the job.  He then tried to get rid of the keyspace by dropping it in the CLI and got the following error:

javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=ColumnFamilies,keyspace=devks,columnfamily=OriginCF

So he punted to me, and I decided to just try restarting the cluster in the hopes that it would sort itself out.  The nodes that were still up died gracefully with the stop-server command, no kill -9s required.  But when I tried to start the nodes again, they all failed with stack traces.

My googling led me to this: https://issues.apache.org/jira/browse/CASSANDRA-2197

So I upgraded to 0.7.2 and tried restarting, once again all the nodes fail with two different stack traces,  but both types occur immediately after an INFO message of the form:

INFO 12:06:26,979 Finished reading /path/to/commitlog/etc/CommitLog-NNNNNNNN.log

The stack traces are one of:

Exception encountered during startup.
java.io.IOError: java.io.EOFException
    at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:246)


Exception encountered during startup.
    at org.apache.cassandra.db.Table.createReplicationStrategy(Table.java:318)

Fortunately, I have the luxury of clearing out the data in the cluster, but I'd like a more elegant option than that.  Anybody have any suggestions?