After 10 days my cluster crashes due to a java.lang.OutOfMemoryError during compaction of the big column family that contains roughly 95% of the data. 
Does this column family have very wide rows ? 

 simply some tweaks I need to make in the yaml file.  I have tried:
The main things that reduce the impact compaction has on memory are:

in_memory_compaction_limit_in_mb
concurrent_compactors

Of the top of my head I cannot think of any shortcuts taken by compaction if/when all data in an SSTable is past TTL. I think there was talk of something like that though. 

Hope that helps.

-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 27/06/2012, at 2:38 AM, Nils Pommerien wrote:

Hello,
I am evaluating Cassandra in a log retrieval application.  My ring conists of3 m2.xlarge instances (17.1 GB memory, 6.5 ECU (2 virtual cores with 3.25 EC2 Compute Units each), 420 GB of local instance storage, 64-bit platform) and I am writing at roughly 220 writes/sec.  Per day I am adding roughly 60GB of data.  All of this sounds simple and easy and all three nodes are humming along with basically no load.  

The issue is that I am writing all my data with a TTL of 10 days.  After 10 days my cluster crashes due to a java.lang.OutOfMemoryError during compaction of the big column family that contains roughly 95% of the data.  So basically after 10 days my data set is 600GB and after 10 days Cassandra would have to tombstone and purge 60GB of data at the same rate of roughly 220 deletes/second.  I am not sure if Cassandra should be able to do it, whether I should take a partitioning approach (one CF per day), or if there is simply some tweaks I need to make in the yaml file.  I have tried:
  1. Decrease flush-largest-memtables-at to .4 
  2. reduce_cache_sizes_at and reduce_cache_capacity_to set to 1
Now, the issue remains the same:

WARN [ScheduledTasks:1] 2012-06-11 19:39:42,017 GCInspector.java (line 145) Heap is 0.9920103380107628 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically.

Eventually it will just die with this message.  This affects all nodes in the cluster, not just one. 
 
Dump file is incomplete: file size limit
ERROR 19:39:39,695 Exception in thread Thread[ReadStage:134,5,main]
java.lang.OutOfMemoryError: Java heap space
ERROR 19:39:39,724 Exception in thread Thread[MutationStage:57,5,main]
java.lang.OutOfMemoryError: Java heap space
      at org.apache.cassandra.utils.FBUtilities.hashToBigInteger(FBUtilities.java:213)
      at org.apache.cassandra.dht.RandomPartitioner.getToken(RandomPartitioner.java:154)
      at org.apache.cassandra.dht.RandomPartitioner.decorateKey(RandomPartitioner.java:47)
      at org.apache.cassandra.db.RowPosition.forKey(RowPosition.java:54)
 
Any help is highly appreciated.  It would be cool to tweak it in a way that I can have a moving window of 10 days in Cassandra while dropping the old data… Or, if there is any other recommended way to deal with such sliding time windows I am open for ideas.

Thank you for your help!