cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Talbot <>
Subject Re: constant CMS GC using CPU time
Date Tue, 23 Oct 2012 21:59:41 GMT
On Mon, Oct 22, 2012 at 6:05 PM, aaron morton <>wrote:

> The GC was on-going even when the nodes were not compacting or running a
> heavy application load -- even when the main app was paused constant the GC
> continued.
> If you restart a node is the onset of GC activity correlated to some event?

Yes and no.  When the nodes were generally under the
.75 occupancy threshold a weekly "repair -pr" job would cause them to go
over the threshold and then stay there even after the repair had completed
and there were no ongoing compactions.  It acts as though at least some
substantial amount of memory used during repair was never dereferenced once
the repair was complete.

Once one CF in particular grew larger the constant GC would start up pretty
soon (less than 90 minutes) after a node restart even without a repair.

> As a test we dropped the largest CF and the memory
> usage immediately dropped to acceptable levels and the constant GC stopped.
>  So it's definitely related to data load.  memtable size is 1 GB, row cache
> is disabled and key cache is small (default).
> How many keys did the CF have per node?
> I dismissed the memory used to  hold bloom filters and index sampling.
> That memory is not considered part of the memtable size, and will end up in
> the tenured heap. It is generally only a problem with very large key counts
> per node.
I've changed the app to retain less data for that CF but I think that it
was about 400M rows per node.  Row keys are a TimeUUID.  All of the rows
are write-once, never updated, and rarely read.  There are no secondary
indexes for this particular CF.

>  They were 2+ GB (as reported by nodetool cfstats anyway).  It looks like
> the default bloom_filter_fp_chance defaults to 0.0
> The default should be 0.000744.
> If the chance is zero or null this code should run when a new SSTable is
> written
>   // paranoia -- we've had bugs in the thrift <-> avro <-> CfDef dance
> before, let's not let that break things
>                 logger.error("Bloom filter FP chance of zero isn't
> supposed to happen");
> Were the CF's migrated from an old version ?
Yes, the CF were created in 1.0.9, then migrated to 1.0.11 and finally to
1.1.5 with a "upgradesstables" run at each upgrade along the way.

I could not find a way to view the current bloom_filter_fp_chance settings
when they are at a default value.  JMX reports the actual fp rate and if a
specific rate is set for a CF that shows up in "describe table" but I
couldn't find out how to tell what the default was.  I didn't inspect the

> Is there any way to predict how much memory the bloom filters will consume
> if the size of the row keys, number or rows is known, and fp chance is
> known?
> See o.a.c.utils.BloomFilter.getFilter() in the code
> This appears to give similar results.

Ahh, very helpful.  This indicates that 714MB would be used for the bloom
filter for that one CF.

JMX / cfstats reports "Bloom Filter Space Used" but the MBean method name
(getBloomFilterDiskSpaceUsed) indicates this is the on-disk space. If
on-disk and in-memory space used is similar then summing up all the "Bloom
Filter Space Used" says they're currently consuming 1-2 GB of the heap
which is substantial.

If a CF is rarely read is it safe to set bloom_filter_fp_chance to 1.0?  It
just means more trips to SSTable indexes for a read correct?  Trade RAM for
time (disk I/O).


View raw message