4 node Cassandra 0.8.5 cluster with RF =2.
One node started throwing exceptions in its log:
ERROR 10:02:46,837 Fatal exception in thread Thread[FlushWriter:1317,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes
Caused by: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes
... 3 more
Checked disk and obviously it's 100% full.
How do I recover from this without loosing the data? I've got plenty of space on the other nodes, so I thought of doing a decommission which I understand reassigns ranges to the other nodes and replicates data to them. After that's done I plan on manually deleting the data on the node and then joining in the same cluster position with auto-bootstrap turned off so that I won't get back the old data and I can continue getting new data with the node.
Note, I would like to have 4 nodes in because the other three barely take the input load alone. These are just long running tests until I get some better machines.
On strange thing I found is that the data folder on the ndoe that filled up the disk is 150 GB (as measured with du) while the data folder on all other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size of around 50GB for all 4 nodes. I though that the node was making a major compaction at which time it filled up the disk....but even that doesn't make sense because shouldn't a major compaction just be capable of doubling the size, not triple-ing it? Doesn anyone know how to explain this behavior?