incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremiah Jordan <jeremiah.jor...@morningstar.com>
Subject Re: Insufficient disk space to flush
Date Thu, 01 Dec 2011 17:58:42 GMT
If you are writing data with QUORUM or ALL you should be safe to restart 
cassandra on that node.  If the extra space is all from *tmp* files from 
compaction they will get deleted at startup.  You will then need to run 
repair on that node to get back any data that was missed while it was 
full.  If your commit log was on a different device you may not even 
have lost much.

-Jeremiah

On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote:
> Hello everyone,
>  4 node Cassandra 0.8.5 cluster with RF =2.
>  One node started throwing exceptions in its log:
>
> ERROR 10:02:46,837 Fatal exception in thread 
> Thread[FlushWriter:1317,5,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Insufficient 
> disk space to flush 17296 bytes
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.RuntimeException: Insufficient disk space to 
> flush 17296 bytes
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301)
>         at 
> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246)
>         at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
>         at 
> org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         ... 3 more
>
> Checked disk and obviously it's 100% full.
>
> How do I recover from this without loosing the data? I've got plenty 
> of space on the other nodes, so I thought of doing a decommission 
> which I understand reassigns ranges to the other nodes and replicates 
> data to them. After that's done I plan on manually deleting the data 
> on the node and then joining in the same cluster position with 
> auto-bootstrap turned off so that I won't get back the old data and I 
> can continue getting new data with the node.
>
> Note, I would like to have 4 nodes in because the other three barely 
> take the input load alone. These are just long running tests until I 
> get some better machines.
>
> On strange thing I found is that the data folder on the ndoe that 
> filled up the disk is 150 GB (as measured with du) while the data 
> folder on all other 3 nodes is 50 GB. At the same time, DataStax 
> OpsCenter shows a size of around 50GB for all 4 nodes. I though that 
> the node was making a major compaction at which time it filled up the 
> disk....but even that doesn't make sense because shouldn't a major 
> compaction just be capable of doubling the size, not triple-ing it? 
> Doesn anyone know how to explain this behavior?
>
> Thanks,
> Alex
>

Mime
View raw message