incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <peter.schul...@infidyne.com>
Subject Re: Node OOM Problems
Date Thu, 19 Aug 2010 20:39:50 GMT
> of a rogue large row is one I never considered. The largest row on the other
> nodes is as much as 800megs. I can not get a cfstats reading on the bad node

WIth 0.6 I can definitely see this being a problem if I understand its
behavior correctly (I have not actually used 0.6 even for testing). In
particular such amounts of data is likely to end up directly in the
old generation in the GC (normally the young generation is smaller
than 800 mb, and that does not take into account the time it takes to
actually read and process those large rows and the likelyhood of a
young-generation gc triggering anyway due to other normal activity).
Having a single value be 10% of the total heap size is likely to be
problematic in general (that could be said in some cases (e.g. 32 bit
virtual memory space and fragmentation issues) for e.g.
malloc()/free() too; algorithms solving general allocation problems
are often not very good at dealing with extreme outliers).

> so do not know how big its largest row is. I will raise memory to 16gb and
> see if that makes a difference. I had though that the java heap sizes that
> high had issues on their own in term of GC.

The garbage collector may or may not have issues in particular cases,
and to some extent the heap size is definitely a factor. However, a
lot of other things play in, including the application's overall
allocation behavior and pointer writing behavior. A large heap size in
and of itself should not be a huge problem; if you combine a very
large heap size with lots of allocation and lots of behavior that is
difficult for the particular GC to deal with, you may be more likely
to have problems.

My gut feeling with Cassandra is that I expect it to be fine, with the
worst case being having to tweak GC settings to e.g. make the
concurrent mark/sweep phases kick in earlier. In other words I would
not expect Cassandra to be an application where it becomes problematic
to keep CMS pause times down. However, I have no hard evidence of
that.

I'd be very interested to hear if people have other experiences in
production environments with very large heap sizes.

-- 
/ Peter Schuller

Mime
View raw message