cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Flush / Snapshot Triggering Full GCs, Leaving Ring
Date Thu, 07 Apr 2011 01:42:53 GMT
Hypothesis: it's probably the flush causing the CMS, not the snapshot linking.

Confirmation possibility #1: Add a logger.warn to
CLibrary.createHardLinkWithExec -- with JNA enabled it shouldn't be
called, but let's rule it out.

Confirmation possibility #2: Force some flushes w/o snapshot.

Either way: "concurrent mode failure" is the easy GC problem.
Hopefully you really are seeing mostly that -- this means the JVM
didn't start CMS early enough, so it ran out of space before it could
finish the concurrent collection, so it falls back to stop-the-world.
The fix is a combination of reducing XX:CMSInitiatingOccupancyFraction
and (possibly) increasing heap capacity if your heap is simply too
full too much of the time.

You can also mitigate it by increasing the phi threshold for the
failure detector, so the node doing the GC doesn't mark everyone else
as dead.

(Eventually your heap will fragment and you will see STW collections
due to "promotion failed," but you should see that much less
frequently. GC tuning to reduce fragmentation may be possible based on
your workload, but that's out of scope here and in any case the "real"
fix for that is

On Wed, Apr 6, 2011 at 2:07 PM, C. Scott Andreas
<> wrote:
> Hello,
> We're running a six-node 0.7.4 ring in EC2 on m1.xlarge instances with 4GB heap (15GB
total memory, 4 cores, dataset fits in RAM, storage on ephemeral disk). We've noticed a brief
flurry of query failures during the night corresponding with our backup schedule. More specifically,
our logs suggest that calling "nodetool snapshot" on a node is triggering 12 to 16 second
CMS GCs and a promotion failure resulting in a full stop-the-world collection, during which
the node is marked dead by the ring until re-joining shortly after.
> Here's a log from one of the nodes, along with system info and JVM options:
> At 13:15:00, our backup cron job runs, which calls nodetool flush, then nodetool snapshot.
(After investigating, we noticed that calling both flush and snapshot is unnecessary, and
have since updated the script to only call snapshot). While writing memtables, we'll generally
see a GC logged out via Cassandra such as:
> "GC for ConcurrentMarkSweep: 16113 ms, 1755422432 reclaimed leaving 1869123536 used;
max is 4424663040."
> In the JVM GC logs, we'll often see a tenured promotion failure occurring during this
collection, resulting in a full stop-the-world GC like this (different node):
> 1180629.380: [CMS1180634.414: [CMS-concurrent-mark: 6.041/6.468 secs] [Times: user=8.00
sys=0.10, real=6.46 secs]
>  (concurrent mode failure): 3904635K->1700629K(4109120K), 16.0548910 secs] 3958389K->1700629K(4185792K),
[CMS Perm : 19610K->19601K(32796K)], 16.1057040 secs] [Times: user=14.39 sys=0.02, real=16.10
> During the GC, the rest of the ring will shun the node, and when the collection completes,
the node will mark all other hosts in the ring as dead. The node and ring stabilize shortly
after once detecting each other as up and completing hinted handoff (details in log).
> We've enabled JNA on one of the nodes to prevent forking a subprocess to call `ln` during
a snapshot yesterday and still observed a concurrent mode failure collection following a flush/snapshot,
but the CMS length was shorter (9 seconds) and did not result in the node being shunned from
the ring.
> While the query failures that result from this activity are brief, our retry threshold
is set to 6 for timeout exceptions. We're concerned that we're exceeding that, and would like
to figure out why we see long CMS collections + promotion failures triggering full GCs during
a snapshot.
> Has anyone seen this, or have suggestions on how to prevent full GCs from occurring during
a flush / snapshot?
> Thanks,
> - Scott
> ---
> C. Scott Andreas
> Engineer, Urban Airship, Inc.

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support

View raw message