There is no way to set a max size on an sstable file. If your Cassandra data directory is not your / filesystem you could reformat it as ext4 (or at least ext3 with better options)
Ok, so my problem persisted. On the node that is filling up the harddisk, I have a 230 GB disk. Right after I restart the node I it deletes tmp files and reaches 55GB of data on disk. Then it start to quickly fill up the disk - I see gigs added fast - it's not real data because other nodes don't have this.
While all this is happening I am seeing the node do a minor compaction of the main data CF but extremely slowly. Today I saw the error:
ERROR 09:44:57,605 Fatal exception in thread Thread[CompactionExecutor:15,1,main]
java.io.IOException: File too large
at java.io.RandomAccessFile.writeBytes(Native Method)
which means that it cannot finish that compaction because it hit the max file size. So I checked the file system and block size and I got ext3 and 1K which means that the max file size is 16GB.
I didn't know what to do in this case so I just decommisioned the node.
Is there a way to get around this max file limit? Is there some Cassandra configuration that helps avoid this? I'm asking here because I couldn't find anything in the documentation about that.
I'm waiting for new machines to run Cassandra on....what file systems are people using?
AlexOn Thu, Dec 1, 2011 at 10:08 PM, Jahangir Mohammed <email@example.com> wrote:
Yes, mostly sounds like it. In our case failed repairs were causing accumulation of the tmp files.Thanks,Jahangir Mohammed.On Thu, Dec 1, 2011 at 2:43 PM, Alexandru Dan Sicoe <firstname.lastname@example.org> wrote:
My commitlog was indeed on another disk. I did what you said and yes the node restart brings back the disk size to the around 50 GB I was expecting. Still I do not understand how the node managed to get itself in the situation of having these tmp files? Could you clarify what these are, how they are produced and why? I've tried to find a clear definition but all I could come up with is hints that they are produced during compaction. I also found a thread that described a similar problem:
as described there it seems like compaction fails and tmp files don't get cleaned up until they fill the disk. Is this what happened in my case? Compactions did not finish properly because the disk utilization was more than half and then more and more files tmp started getting accumulated at each other attempt. The Cassandra log would indicate this because I get many of these:
ERROR [CompactionExecutor:22850] 2011-12-01 04:12:15,200 CompactionManager.java (line 513) insufficie
nt space to compact even the two smallest files, aborting
before I started getting many of these:
ERROR [FlushWriter:283] 2011-12-01 04:12:22,917 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[FlushWriter:283,5,main] java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk space to flush 42531 bytes
I just want to clearly understand what happened.
AlexOn Thu, Dec 1, 2011 at 6:58 PM, Jeremiah Jordan <email@example.com> wrote:
If you are writing data with QUORUM or ALL you should be safe to restart cassandra on that node. If the extra space is all from *tmp* files from compaction they will get deleted at startup. You will then need to run repair on that node to get back any data that was missed while it was full. If your commit log was on a different device you may not even have lost much.
On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote:Hello everyone,
4 node Cassandra 0.8.5 cluster with RF =2.
One node started throwing exceptions in its log:
ERROR 10:02:46,837 Fatal exception in thread Thread[FlushWriter:1317,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes
Caused by: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes
... 3 more
Checked disk and obviously it's 100% full.
How do I recover from this without loosing the data? I've got plenty of space on the other nodes, so I thought of doing a decommission which I understand reassigns ranges to the other nodes and replicates data to them. After that's done I plan on manually deleting the data on the node and then joining in the same cluster position with auto-bootstrap turned off so that I won't get back the old data and I can continue getting new data with the node.
Note, I would like to have 4 nodes in because the other three barely take the input load alone. These are just long running tests until I get some better machines.
On strange thing I found is that the data folder on the ndoe that filled up the disk is 150 GB (as measured with du) while the data folder on all other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size of around 50GB for all 4 nodes. I though that the node was making a major compaction at which time it filled up the disk....but even that doesn't make sense because shouldn't a major compaction just be capable of doubling the size, not triple-ing it? Doesn anyone know how to explain this behavior?
Alexandru Dan Sicoe
MEng, CERN Marie Curie ACEOLE Fellow