incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jahangir Mohammed <>
Subject Re: Insufficient disk space to flush
Date Thu, 01 Dec 2011 21:08:10 GMT
Yes, mostly sounds like it. In our case failed repairs were causing
accumulation of the tmp files.

Jahangir Mohammed.

On Thu, Dec 1, 2011 at 2:43 PM, Alexandru Dan Sicoe <> wrote:

> Hi Jeremiah,
>  My commitlog was indeed on another disk. I did what you said and yes the
> node restart brings back the disk size to the around 50 GB I was expecting.
> Still I do not understand how the node managed to get itself in the
> situation of having these tmp files? Could you clarify what these are, how
> they are produced and why? I've tried to find a clear definition but all I
> could come up with is hints that they are produced during compaction. I
> also found a thread that described a similar problem:
> as described there it seems like compaction fails and tmp files don't get
> cleaned up until they fill the disk. Is this what happened in my case?
> Compactions did not finish properly because the disk utilization was more
> than half and then more and more files tmp started getting accumulated at
> each other attempt. The Cassandra log would indicate this because I get
> many of these:
> ERROR [CompactionExecutor:22850] 2011-12-01 04:12:15,200
> (line 513) insufficie
> nt space to compact even the two smallest files, aborting
> before I started getting many of these:
> ERROR [FlushWriter:283] 2011-12-01 04:12:22,917
> (line 139) Fatal exception in thread
> Thread[FlushWriter:283,5,main] java.lang.RuntimeException:
> java.lang.RuntimeException: Insufficient disk space to flush 42531 bytes
> I just want to clearly understand what happened.
> Thanks,
> Alex
> On Thu, Dec 1, 2011 at 6:58 PM, Jeremiah Jordan <
>> wrote:
>>  If you are writing data with QUORUM or ALL you should be safe to restart
>> cassandra on that node.  If the extra space is all from *tmp* files from
>> compaction they will get deleted at startup.  You will then need to run
>> repair on that node to get back any data that was missed while it was
>> full.  If your commit log was on a different device you may not even have
>> lost much.
>> -Jeremiah
>> On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote:
>> Hello everyone,
>>  4 node Cassandra 0.8.5 cluster with RF =2.
>>  One node started throwing exceptions in its log:
>> ERROR 10:02:46,837 Fatal exception in thread
>> Thread[FlushWriter:1317,5,main]
>> java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk
>> space to flush 17296 bytes
>>         at
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
>>         at
>> java.util.concurrent.ThreadPoolExecutor$
>>         at
>> Caused by: java.lang.RuntimeException: Insufficient disk space to flush
>> 17296 bytes
>>         at
>> org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(
>>         at
>> org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(
>>         at
>> org.apache.cassandra.db.Memtable.writeSortedContents(
>>         at org.apache.cassandra.db.Memtable.access$400(
>>         at
>> org.apache.cassandra.db.Memtable$3.runMayThrow(
>>         at
>>         ... 3 more
>> Checked disk and obviously it's 100% full.
>> How do I recover from this without loosing the data? I've got plenty of
>> space on the other nodes, so I thought of doing a decommission which I
>> understand reassigns ranges to the other nodes and replicates data to them.
>> After that's done I plan on manually deleting the data on the node and then
>> joining in the same cluster position with auto-bootstrap turned off so that
>> I won't get back the old data and I can continue getting new data with the
>> node.
>> Note, I would like to have 4 nodes in because the other three barely take
>> the input load alone. These are just long running tests until I get some
>> better machines.
>> On strange thing I found is that the data folder on the ndoe that filled
>> up the disk is 150 GB (as measured with du) while the data folder on all
>> other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size
>> of around 50GB for all 4 nodes. I though that the node was making a major
>> compaction at which time it filled up the disk....but even that doesn't
>> make sense because shouldn't a major compaction just be capable of doubling
>> the size, not triple-ing it? Doesn anyone know how to explain this behavior?
>> Thanks,
>> Alex

View raw message