Partially, I guess this may be a side effect of multithreaded compactions?

Before running out of space completely, I do see a few of these:
 WARN [CompactionExecutor:448] 2011-05-02 01:08:10,480 CompactionManager.java (line 516) insufficient space to compact all requested files SSTableReader(path='/data/cassandra/JP_MALL_P
H/Order-f-12858-Data.db'), SSTableReader(path='/data/cassandra/JP_MALL_PH/Order-f-12851-Data.db'), SSTableReader(path='/data/cassandra/JP_MALL_PH/Order-f-12864-Data.db')
 INFO [CompactionExecutor:448] 2011-05-02 01:08:10,481 StorageService.java (line 2066) requesting GC to free disk space

In this case, there would be 24 threads that asked if there was empty disk space. 

Most of them probably succeeded in that request, but they could have requested 24x available space in theory since I do not think there is any global pool of used disk in place that manages which how much disk space will be needed for already started compactions?

Of course, regardless how much checking there is in advance, we could still run out of disk, so I guess there is also a need for checking if diskspace is about to run out while compaction runs so things may be halted/aborted. Unfortunately that would need global coordination so we do not stop all compaction threads....

After reducing to 6 compaction threads in 0.8 beta2, the data has compacted just fine without any disk space issues, so I guess another problem you may hit as you get a lot of sstables which have updates (that is, duplicates) to the same data, is that of course, the massively concurrent compaction taking place with nproc threads could also concurrently duplicate all the duplicates on a large scale.

Yes, this is in favour of multithreaded compaction as it should normally help keeping sstables to a sane level and avoid such problems, but it is unfortunately just a kludge to the real problem which is to segment the sstables somehow on keyspace so we can get down the disk requirements and recover from scenarios where disk gets above 50%. 

Regards, 
Terje



On Wed, May 4, 2011 at 3:33 PM, Terje Marthinussen <tmarthinussen@gmail.com> wrote:
Well, just did not look at these logs very well at all last night
First out of disk message:
ERROR [CompactionExecutor:387] 2011-05-02 01:16:01,027 AbstractCassandraDaemon.java (line 112) Fatal exception in thread Thread[CompactionExecutor:387,1,main]
java.io.IOException: No space left on device

Then finally the last one
ERROR [FlushWriter:128] 2011-05-02 01:51:06,112 AbstractCassandraDaemon.java (line 112) Fatal exception in thread Thread[FlushWriter:128,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk space to flush 554962 bytes
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: Insufficient disk space to flush 554962 bytes
        at org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:597)
        at org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2100)
        at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:239)
        at org.apache.cassandra.db.Memtable.access$400(Memtable.java:50)
        at org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:263)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        ... 3 more
 INFO [CompactionExecutor:451] 2011-05-02 01:51:06,113 StorageService.java (line 2066) requesting GC to free disk space
[lots of sstables deleted]

After this is starts running again (although not fine it seems).

So the disk seems to have been full for 35 minutes due to un-deleted sstables.

Terje

On Wed, May 4, 2011 at 6:34 AM, Terje Marthinussen <tmarthinussen@gmail.com> wrote:
Hm... peculiar.

Post flush is not involved in compactions, right?

May 2nd
01:06 - Out of disk
01:51 - Starts a mix of major and minor compactions on different column families
It then starts a few minor compactions extra over the day, but given that there are more than 1000 sstables, and we are talking 3 minor compactions started, it is not normal I think.
May 3rd 1 minor compaction started.

When I checked today, there was a bunch of tmp files on the disk with last modify time from 01:something on may 2nd and 200GB empty disk...

Definitely no compaction going on. 
Guess I will add some debug logging and see if I get lucky and run out of disk again.

Terje

On Wed, May 4, 2011 at 5:06 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
Compaction does, but flush didn't until
https://issues.apache.org/jira/browse/CASSANDRA-2404

On Tue, May 3, 2011 at 2:26 PM, Terje Marthinussen
<tmarthinussen@gmail.com> wrote:
> Yes, I realize that.
> I am bit curious why it ran out of disk, or rather, why I have 200GB empty
> disk now, but unfortunately it seems like we may not have had monitoring
> enabled on this node to tell me what happened in terms of disk usage.
> I also thought that compaction was supposed to resume (try again with less
> data) if it fails?
> Terje
>
> On Wed, May 4, 2011 at 3:50 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> post flusher is responsible for updating commitlog header after a
>> flush; each task waits for a specific flush to complete, then does its
>> thing.
>>
>> so when you had a flush catastrophically fail, its corresponding
>> post-flush task will be stuck.
>>
>> On Tue, May 3, 2011 at 1:20 PM, Terje Marthinussen
>> <tmarthinussen@gmail.com> wrote:
>> > Just some very tiny amount of writes in the background here (some hints
>> > spooled up on another node slowly coming in).
>> > No new data.
>> >
>> > I thought there was no exceptions, but I did not look far enough back in
>> > the
>> > log at first.
>> > Going back a bit further now however, I see that about 50 hours ago:
>> > ERROR [CompactionExecutor:387] 2011-05-02 01:16:01,027
>> > AbstractCassandraDaemon.java (line 112) Fatal exception in thread
>> > Thread[CompactionExecutor:387,1,main]
>> > java.io.IOException: No space left on device
>> >         at java.io.RandomAccessFile.writeBytes(Native Method)
>> >         at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
>> >         at
>> >
>> > org.apache.cassandra.io.util.BufferedRandomAccessFile.flush(BufferedRandomAccessFile.java:160)
>> >         at
>> >
>> > org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:225)
>> >         at
>> >
>> > org.apache.cassandra.io.util.BufferedRandomAccessFile.writeAtMost(BufferedRandomAccessFile.java:356)
>> >         at
>> >
>> > org.apache.cassandra.io.util.BufferedRandomAccessFile.write(BufferedRandomAccessFile.java:335)
>> >         at
>> > org.apache.cassandra.io.PrecompactedRow.write(PrecompactedRow.java:102)
>> >         at
>> >
>> > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:130)
>> >         at
>> >
>> > org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:566)
>> >         at
>> >
>> > org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:146)
>> >         at
>> >
>> > org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:112)
>> >         at
>> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >         at
>> >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >         at
>> >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >         at java.lang.Thread.run(Thread.java:662)
>> > [followed by a few more of those...]
>> > and then a bunch of these:
>> > ERROR [FlushWriter:123] 2011-05-02 01:21:12,690
>> > AbstractCassandraDaemon.java
>> > (line 112) Fatal exception in thread Thread[FlushWriter:123,5,main]
>> > java.lang.RuntimeException: java.lang.RuntimeException: Insufficient
>> > disk
>> > space to flush 40009184 bytes
>> >         at
>> > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>> >         at
>> >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >         at
>> >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >         at java.lang.Thread.run(Thread.java:662)
>> > Caused by: java.lang.RuntimeException: Insufficient disk space to flush
>> > 40009184 bytes
>> >         at
>> >
>> > org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:597)
>> >         at
>> >
>> > org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2100)
>> >         at
>> > org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:239)
>> >         at org.apache.cassandra.db.Memtable.access$400(Memtable.java:50)
>> >         at
>> > org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:263)
>> >         at
>> > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>> >         ... 3 more
>> > Seems like compactions stopped after this (a bunch of tmp tables there
>> > still
>> > from when those errors where generated), and I can only suspect the post
>> > flusher may have stopped at the same time.
>> > There is 890GB of disk for data, sstables are currently using 604G
>> > (139GB is
>> > old tmp tables from when it ran out of disk) and "ring" tells me the
>> > load on
>> > the node is 313GB.
>> > Terje
>> >
>> >
>> > On Wed, May 4, 2011 at 3:02 AM, Jonathan Ellis <jbellis@gmail.com>
>> > wrote:
>> >>
>> >> ... and are there any exceptions in the log?
>> >>
>> >> On Tue, May 3, 2011 at 1:01 PM, Jonathan Ellis <jbellis@gmail.com>
>> >> wrote:
>> >> > Does it resolve down to 0 eventually if you stop doing writes?
>> >> >
>> >> > On Tue, May 3, 2011 at 12:56 PM, Terje Marthinussen
>> >> > <tmarthinussen@gmail.com> wrote:
>> >> >> Cassandra 0.8 beta trunk from about 1 week ago:
>> >> >> Pool Name                    Active   Pending      Completed
>> >> >> ReadStage                         0         0              5
>> >> >> RequestResponseStage              0         0          87129
>> >> >> MutationStage                     0         0         187298
>> >> >> ReadRepairStage                   0         0              0
>> >> >> ReplicateOnWriteStage             0         0              0
>> >> >> GossipStage                       0         0        1353524
>> >> >> AntiEntropyStage                  0         0              0
>> >> >> MigrationStage                    0         0             10
>> >> >> MemtablePostFlusher               1       190            108
>> >> >> StreamStage                       0         0              0
>> >> >> FlushWriter                       0         0            302
>> >> >> FILEUTILS-DELETE-POOL             0         0             26
>> >> >> MiscStage                         0         0              0
>> >> >> FlushSorter                       0         0              0
>> >> >> InternalResponseStage             0         0              0
>> >> >> HintedHandoff                     1         4              7
>> >> >>
>> >> >> Anyone with nice theories about the pending value on the memtable
>> >> >> post
>> >> >> flusher?
>> >> >> Regards,
>> >> >> Terje
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Jonathan Ellis
>> >> > Project Chair, Apache Cassandra
>> >> > co-founder of DataStax, the source for professional Cassandra support
>> >> > http://www.datastax.com
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of DataStax, the source for professional Cassandra support
>> >> http://www.datastax.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com