We are currently evaluating cassandra 2.0 to be used with a Project.
The cluster constists of 5 identical nodes each has 16Gb RAM and a 6 core Xeon and 2TB harddisk.
The heap max size is defined with 8Gig and row_Cache_size_in_mb=0
The last test was a write test, runs several days (with nearly only write requests) and inserts 850.000 keys and columns in a single column Family resulting in about 170Gig of data stored in total on each node. One node died with an OOM and i’m not able to bring it up again. It keeps crashing with OOM on CommitLog replay:
ERROR [MutationStage:20] 2013-10-30 08:35:23,160 CassandraDaemon.java (line 186) Exception in thread Thread[MutationStage:20,5,main]
java.lang.OutOfMemoryError: Java heap space
 at edu.stanford.ppl.concurrent.SnapTreeMap.comparable(SnapTreeMap.java:534)
 at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1019)
 at edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
 at org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:312)
 at org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:184)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:255)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:171)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338)
 at org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:265)
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)

I have also a heap dump available for the crash during startup.
The CommitLog dir has a total size of nearly 3Gig.

I know that i can clean the commitLog dir to bring the node up, since it is only test data it is no Problem for us. But the more interesting is how can we prevent that?

Fabian Seifert