Short answer, yes it's safe to kill cassandra during a repair. It's one of the nice things about never mutating data.

Longer answer: If nodetool compactionstats says there are no Validation compactions running (and the compaction queue is empty)  and netstats says there is nothing streaming there is a a good chance the repair is finished or dead. If a neighbour dies during a repair the node it was started on will wait for 48 hours(?) until it times out. Check the logs on the machines for errors, particularly from the AntiEntropyService. And see what compactionstats is saying on all the nodes involved in the repair.


Thanks Aaron. One of the neighboring nodes did go down due to running out of memory so I will make sure the repair is dead and start it again per column family.


Even Longer: um, 3 TB of data is *way* to much data per node, generally happy people have up to about 200 to 300GB per node. The reason for this recommendation is so that things like repair, compaction, node moves, etc are managable  and because the loss of a single node has less of an impact. I would not recommend running a live system with that much data per node.


 Thanks for the advice and this can be a separate discussion but that will make a Cassandra cluster way too costly , we would have to buy 16 systems for the same amount of data as opposed to 4 that we have now and my IT director will strangle me.

-Adi