Do you have a log message for the OOM? And some GC messages around it? Have you tried watching the server with jconsole?

Is the OOM happening on system start or after it's been running ? Or both?

Do you have any row/key caches? Cannot remember but is 0.6* has this but have you enabled the save cache feature?

Aaron
 
On 02 Dec, 2010,at 01:28 PM, Aram Ayazyan <ayazyan@gmail.com> wrote:

Hi,

We have a small cluster of 3 Cassandra servers running w/ full
replication. Every once in a while we get an OutOfMemory exception and
have to restart servers. Sometimes just restarting doesnít do it and
we have to clean the commitlog or data directory.

We are running Cassandra 0.6.8. There is only 1 keyspace and 3 column
families. There are less than 1000 keys across all column families.
There is roughly 1 write request per second and 1 read request. Each
server is allocated 1GB. Size of all files in data directory of the
only column family is ~300MB. MemtableThroughputInMB is throttled way
down to 2 and BinaryMemtableThroughputInMB to 8 (w/ higher values we
were running out of memory extremely fast, this way it works for a
couple of days w/o crashing).

Last time this issue happened, I didnít clear the commitlog/data
folders, enabled gc logging and restarted Cassandra. It crashes really
fast, but what is really strange is that it seems like it still has
plenty of memory when the error happens, last 3 lines from gc log:
21.408: [GC 437098K->436592K(1046464K), 0.0986800 secs]
21.520: [GC 453616K->453117K(1046464K), 0.0967770 secs]
21.629: [GC 470141K->469436K(1046464K), 0.0383520 secs]
The full log is here: http://pastebin.com/XGRSRcBd

Iíve tried increasing the memory up to 1.5GB, but it still doesnít start.

Any ideas what might be the problem here?

Thank you,
Aram