cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: Node OOM Problems
Date Thu, 19 Aug 2010 21:37:36 GMT
On Thu, Aug 19, 2010 at 4:49 PM, Wayne <> wrote:
> What is my "live set"? Is the system CPU bound given the few statements
> below? This is from running 4 concurrent processes against the I
> need to throttle back the concurrent read/writers?
> I do all reads/writes as Quorum. (Replication factor of 3).
> The memtable threshold is the default of 256.
> All caching is turned off.
> The database is pretty small, maybe a few million keys (2-3) in 4 CFs. The
> key size is pretty small. Some of the rows are pretty fat though (fatter
> than I thought). I am saving secondary indexes in separate CFs and those are
> the large rows that I think might be part of the problem. I will restart
> testing turning these off and see if I see any difference.
> Would an extra fat row explain repeated OOM crashes in a row? I have finally
> got the system to stabilize relatively and I even ran compaction on the bad
> node without a problem (still no row size stats).
> I now have several other nodes flapping with the following single error in
> the cassandra.log
> Error: Exception thrown by the agent : java.lang.NullPointerException
> I assume this is an unrelated problem?
> Thanks for all of your help!
> On Thu, Aug 19, 2010 at 10:26 PM, Peter Schuller
> <> wrote:
>> So, these:
>> >  INFO [GC inspection] 2010-08-19 16:34:46,656 (line
>> > 116) GC
>> > for ConcurrentMarkSweep: 41615 ms, 192522712 reclaimed leaving
>> > 8326856720
>> > used; max is 8700035072
>> [snip]
>> > INFO [GC inspection] 2010-08-19 16:36:00,786 (line 116)
>> > GC for ConcurrentMarkSweep: 37122 ms, 157488
>> > reclaimed leaving 8342836376 used; max is 8700035072
>> that you're live set is indeed very close to heap maximum, and
>> so concurrent mark/sweep phases run often freeing very little memory.
>> In addition the fact that it seems to take 35-45 seconds to do the
>> concurrent mark/sweep on an 8 gig heap on a modern system suggests
>> that you are probably CPU bound in cassandra at the time (meaning GC
>> is slower).
>> In short you're using too much memory in comparison to the maximum
>> heap size. The expected result is to either get an OOM, or just become
>> too slow due to excessive GC activity (usually the latter followed by
>> the former).
>> Now, the question is what memory is used *for*, and why. First off, to
>> get that out of the way, are you inserting with consistency level
>> ZERO? I am not sure whether it applies to 0.6.4 or not but there used
>> to be an issue involving writes at consistency level ZERO not being
>> throttled at all, meaning that if you threw writes at the system
>> faster than it would handle them, you would accumulate memory use. I
>> don't believe this is a problem with CL.ONE and above, even in 0.6.4
>> (but someone correct me if I'm wrong).
>> (As an aside: I'm not sure whether the behavior was such that it might
>> explain OOM on restart as a result of accumulated commitlogs that get
>> replayed faster than memtable flushing happens. Perhaps not, not
>> sure.)
>> In any case, the most important factors are what you're actually doing
>> with the cluster, but you don't say much about the data. In particular
>> how many rows and colums you're populating it with.
>> The primary users of large amounts of memory in cassandra include
>> (hopefully I'm not missing something major);
>> * bloom filters that are used to efficiently avoid doing I/O on
>> sstables that do not contain relevant data. the size of bloom filters
>> scale linearly with the number of row keys (not columns right? I don't
>> remember). so here we have an expected permanent, but low, memory use
>> as a result of a large database. how large is your database? 100
>> million keys? 1 billion? 10 billion?
>> * the memtables; the currently active memtable and any memtables
>> currently undergoing flushing. the size of these are directly
>> controllable in the configuration file. make sure they are reasonable.
>> (If you're not sure at all, with an 8 gig heap I'd say <= 512 mb is a
>> reasonable recommendation unless you have a reason to make them
>> larger)
>> * row cache and key cache, both controllable in the configuration. in
>> particular the row cache can be huge if you have configured it as
>> such.
>> * to some extent unflushed commitlogs; the commit log rotation
>> threshold controls this. the default value is low enough that it
>> should not be your culprit
>> So the question is what you're usage is like. How many unique rows do
>> you have? How many columns? The data size in and of itself should not
>> matter much to memory use, except of course that extremely large
>> individual values will be relevant to transient high memory use when
>> they are read/written.
>> In general, lacking large row caches and such things, you should be
>> able to have hundreds of millions of entries on an 8 gb heap, assuming
>> reasonably sized keys.
>> --
>> / Peter Schuller

"live set" is active data. For example I may have 900GB of data on
disk, but at given time X 10GB are in being read/written or
replicated. My "live set" would be the 10 GB

Would an extra fat row explain repeated OOM crashes in a row?

Highly likely. I had a row that was 112 MB and 4,000,000+ columns. It
caused havoc for me. Read what Peter described above. It may not be
the problem but that is a place to start looking.

View raw message