From Peter Schuller <>
Subject Re: Node OOM Problems
Date Thu, 19 Aug 2010 21:42:36 GMT
> What is my "live set"?

Sorry; that meant the "set of data acually live (i.e., not garbage) in
the heap". In other words, the amount of memory truly "used".

> Is the system CPU bound given the few statements
> below? This is from running 4 concurrent processes against the I
> need to throttle back the concurrent read/writers?
> I do all reads/writes as Quorum. (Replication factor of 3).

With quorom and 0.6.4 I don't think unthrottled writes are expected to
cause a problem.

> The memtable threshold is the default of 256.
> All caching is turned off.
> The database is pretty small, maybe a few million keys (2-3) in 4 CFs. The
> key size is pretty small. Some of the rows are pretty fat though (fatter
> than I thought). I am saving secondary indexes in separate CFs and those are
> the large rows that I think might be part of the problem. I will restart
> testing turning these off and see if I see any difference.
> Would an extra fat row explain repeated OOM crashes in a row? I have finally
> got the system to stabilize relatively and I even ran compaction on the bad
> node without a problem (still no row size stats).

Based on what you've said so far, the large rows are the only thing I
would suspect may be the cause. With the amount of data and keys you
say you have, you should definitely not be having memory issues with
an 8 gig heap as a direct result of the data size/key count. A few
million keys is not a lot at all; I still claim you should be able to
handle hundreds of millions at least, from the perspective of bloom
filters and such.

So your plan to try it without these large rows is probably a good
idea unless some else has a better idea.

You may want to consider trying 0.7 betas too since it has removed the
limitation with respect to large rows, assuming you do in fact want
these large rows (see the CassandraLimitations wiki page that was
posted earlier in this thread).

> I now have several other nodes flapping with the following single error in
> the cassandra.log
> Error: Exception thrown by the agent : java.lang.NullPointerException
> I assume this is an unrelated problem?

Do you have a full stack trace?

/ Peter Schuller

