cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Flush / Snapshot Triggering Full GCs, Leaving Ring
Date Thu, 07 Apr 2011 20:43:40 GMT
On Thu, Apr 7, 2011 at 2:27 PM, Erik Onnen <> wrote:
> 1) Does this seem like a sane amount of garbage (512MB) to generate
> when flushing a 64MB table to disk?

Sort of -- that's just about exactly the amount of space you'd expect
64MB of serialized data to take, in memory. (Not very efficient, I
know.)  So, you would expect that much to be available to GC, after a

Also, flush creates a buffer equal to in_memory_compaction_limit.  So
that will also generate a spike.  I think you upgraded from 0.6 -- if
the converter turned row size warning limit into i_m_c_l then it could
be much larger.

Otherwise, not sure why flush would consume that much *extra* though.
Smells like something unexpected in the flush code to me.  I don't see
anything obvious though.  SSTableWriter serializes directly to the
outputstream without (m)any other allocations.

> 2) Is this possibly a case of the MaxTenuringThreshold=1 working
> against cassandra? The flush seems to create a lot of garbage very
> quickly such that normal CMS isn't even possible. I'm sure there was a
> reason to introduce this setting but I'm not sure it's universally
> beneficial. Is there any history on the decision to opt for immediate
> promotion rather than using an adaptable number of survivor
> generations?

The history is that, way back in the early days, we used to max it out
the other way (MTT=128) but observed behavior is that objects that
survive 1 new gen collection are very likely to survive "forever."
This fits with what we expect theoretically: read requests and
ephemera from write requests will happen in a small number of ms, but
memtable data is not GC-able until flush. (Rowcache data of course is
effectively unbounded in tenure.)  Keeping long-lived data in a
survivor space just makes new gen collections take longer since you
are copying that data back and forth over and over.

(We have advised some read-heavy customers to ramp up to MTT=16, so
it's not a hard-and-fast rule, but it still feels like a reasonable
starting point to me.)

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support

View raw message