incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ruslan usifov <>
Subject Re: Flush / Snapshot Triggering Full GCs, Leaving Ring
Date Thu, 07 Apr 2011 12:33:15 GMT
2011/4/7 Jonathan Ellis <>

> Hypothesis: it's probably the flush causing the CMS, not the snapshot
> linking.
> Confirmation possibility #1: Add a logger.warn to
> CLibrary.createHardLinkWithExec -- with JNA enabled it shouldn't be
> called, but let's rule it out.
> Confirmation possibility #2: Force some flushes w/o snapshot.
> Either way: "concurrent mode failure" is the easy GC problem.
> Hopefully you really are seeing mostly that -- this means the JVM
> didn't start CMS early enough, so it ran out of space before it could
> finish the concurrent collection, so it falls back to stop-the-world.
> The fix is a combination of reducing XX:CMSInitiatingOccupancyFraction
> and (possibly) increasing heap capacity if your heap is simply too
> full too much of the time.
> You can also mitigate it by increasing the phi threshold for the
> failure detector, so the node doing the GC doesn't mark everyone else
> as dead.
> (Eventually your heap will fragment and you will see STW collections
> due to "promotion failed," but you should see that much less
> frequently. GC tuning to reduce fragmentation may be possible based on
> your workload, but that's out of scope here and in any case the "real"
> fix for that is
Jonatan do you have plans to backport this to 0.7 branch. (Because It's very
hard to tune CMS, and if people is novice in java this task becomes much
harder )

View raw message