cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <peter.schul...@infidyne.com>
Subject Re: Node OOM Problems
Date Thu, 19 Aug 2010 20:26:03 GMT
So, these:

>  INFO [GC inspection] 2010-08-19 16:34:46,656 GCInspector.java (line 116) GC
> for ConcurrentMarkSweep: 41615 ms, 192522712 reclaimed leaving 8326856720
> used; max is 8700035072
[snip]
> INFO [GC inspection] 2010-08-19 16:36:00,786 GCInspector.java (line 116) GC for ConcurrentMarkSweep:
37122 ms, 157488
> reclaimed leaving 8342836376 used; max is 8700035072

...show that you're live set is indeed very close to heap maximum, and
so concurrent mark/sweep phases run often freeing very little memory.
In addition the fact that it seems to take 35-45 seconds to do the
concurrent mark/sweep on an 8 gig heap on a modern system suggests
that you are probably CPU bound in cassandra at the time (meaning GC
is slower).

In short you're using too much memory in comparison to the maximum
heap size. The expected result is to either get an OOM, or just become
too slow due to excessive GC activity (usually the latter followed by
the former).

Now, the question is what memory is used *for*, and why. First off, to
get that out of the way, are you inserting with consistency level
ZERO? I am not sure whether it applies to 0.6.4 or not but there used
to be an issue involving writes at consistency level ZERO not being
throttled at all, meaning that if you threw writes at the system
faster than it would handle them, you would accumulate memory use. I
don't believe this is a problem with CL.ONE and above, even in 0.6.4
(but someone correct me if I'm wrong).

(As an aside: I'm not sure whether the behavior was such that it might
explain OOM on restart as a result of accumulated commitlogs that get
replayed faster than memtable flushing happens. Perhaps not, not
sure.)

In any case, the most important factors are what you're actually doing
with the cluster, but you don't say much about the data. In particular
how many rows and colums you're populating it with.

The primary users of large amounts of memory in cassandra include
(hopefully I'm not missing something major);

* bloom filters that are used to efficiently avoid doing I/O on
sstables that do not contain relevant data. the size of bloom filters
scale linearly with the number of row keys (not columns right? I don't
remember). so here we have an expected permanent, but low, memory use
as a result of a large database. how large is your database? 100
million keys? 1 billion? 10 billion?

* the memtables; the currently active memtable and any memtables
currently undergoing flushing. the size of these are directly
controllable in the configuration file. make sure they are reasonable.
(If you're not sure at all, with an 8 gig heap I'd say <= 512 mb is a
reasonable recommendation unless you have a reason to make them
larger)

* row cache and key cache, both controllable in the configuration. in
particular the row cache can be huge if you have configured it as
such.

* to some extent unflushed commitlogs; the commit log rotation
threshold controls this. the default value is low enough that it
should not be your culprit

So the question is what you're usage is like. How many unique rows do
you have? How many columns? The data size in and of itself should not
matter much to memory use, except of course that extremely large
individual values will be relevant to transient high memory use when
they are read/written.

In general, lacking large row caches and such things, you should be
able to have hundreds of millions of entries on an 8 gb heap, assuming
reasonably sized keys.

-- 
/ Peter Schuller

Mime
View raw message