If you are writing data with QUORUM or ALL you should be safe to
restart cassandra on that node. If the extra space is all from
*tmp* files from compaction they will get deleted at startup. You
will then need to run repair on that node to get back any data that
was missed while it was full. If your commit log was on a different
device you may not even have lost much.
On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote:
4 node Cassandra 0.8.5 cluster with RF =2.
One node started throwing exceptions in its log:
ERROR 10:02:46,837 Fatal exception in thread
Insufficient disk space to flush 17296 bytes
Caused by: java.lang.RuntimeException: Insufficient disk space to
flush 17296 bytes
... 3 more
Checked disk and obviously it's 100% full.
How do I recover from this without loosing the data? I've got
plenty of space on the other nodes, so I thought of doing a
decommission which I understand reassigns ranges to the other
nodes and replicates data to them. After that's done I plan on
manually deleting the data on the node and then joining in the
same cluster position with auto-bootstrap turned off so that I
won't get back the old data and I can continue getting new data
with the node.
Note, I would like to have 4 nodes in because the other three
barely take the input load alone. These are just long running
tests until I get some better machines.
On strange thing I found is that the data folder on the ndoe that
filled up the disk is 150 GB (as measured with du) while the data
folder on all other 3 nodes is 50 GB. At the same time, DataStax
OpsCenter shows a size of around 50GB for all 4 nodes. I though
that the node was making a major compaction at which time it
filled up the disk....but even that doesn't make sense because
shouldn't a major compaction just be capable of doubling the size,
not triple-ing it? Doesn anyone know how to explain this behavior?