cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "B. Todd Burruss" <bto...@gmail.com>
Subject Re: constant CMS GC using CPU time
Date Wed, 24 Oct 2012 03:51:31 GMT
Regarding memory usage after a repair ... Are the merkle trees kept around?
On Oct 23, 2012 3:00 PM, "Bryan Talbot" <btalbot@aeriagames.com> wrote:

> On Mon, Oct 22, 2012 at 6:05 PM, aaron morton <aaron@thelastpickle.com>wrote:
>
>> The GC was on-going even when the nodes were not compacting or running a
>> heavy application load -- even when the main app was paused constant the GC
>> continued.
>>
>> If you restart a node is the onset of GC activity correlated to some
>> event?
>>
>
> Yes and no.  When the nodes were generally under the
> .75 occupancy threshold a weekly "repair -pr" job would cause them to go
> over the threshold and then stay there even after the repair had completed
> and there were no ongoing compactions.  It acts as though at least some
> substantial amount of memory used during repair was never dereferenced once
> the repair was complete.
>
> Once one CF in particular grew larger the constant GC would start up
> pretty soon (less than 90 minutes) after a node restart even without a
> repair.
>
>
>
>
>>
>>
>> As a test we dropped the largest CF and the memory
>> usage immediately dropped to acceptable levels and the constant GC stopped.
>>  So it's definitely related to data load.  memtable size is 1 GB, row cache
>> is disabled and key cache is small (default).
>>
>> How many keys did the CF have per node?
>> I dismissed the memory used to  hold bloom filters and index sampling.
>> That memory is not considered part of the memtable size, and will end up in
>> the tenured heap. It is generally only a problem with very large key counts
>> per node.
>>
>>
> I've changed the app to retain less data for that CF but I think that it
> was about 400M rows per node.  Row keys are a TimeUUID.  All of the rows
> are write-once, never updated, and rarely read.  There are no secondary
> indexes for this particular CF.
>
>
>
>
>>  They were 2+ GB (as reported by nodetool cfstats anyway).  It looks like
>> the default bloom_filter_fp_chance defaults to 0.0
>>
>> The default should be 0.000744.
>>
>> If the chance is zero or null this code should run when a new SSTable is
>> written
>>   // paranoia -- we've had bugs in the thrift <-> avro <-> CfDef dance
>> before, let's not let that break things
>>                 logger.error("Bloom filter FP chance of zero isn't
>> supposed to happen");
>>
>> Were the CF's migrated from an old version ?
>>
>>
> Yes, the CF were created in 1.0.9, then migrated to 1.0.11 and finally to
> 1.1.5 with a "upgradesstables" run at each upgrade along the way.
>
> I could not find a way to view the current bloom_filter_fp_chance settings
> when they are at a default value.  JMX reports the actual fp rate and if a
> specific rate is set for a CF that shows up in "describe table" but I
> couldn't find out how to tell what the default was.  I didn't inspect the
> source.
>
>
>
>> Is there any way to predict how much memory the bloom filters will
>> consume if the size of the row keys, number or rows is known, and fp chance
>> is known?
>>
>>
>> See o.a.c.utils.BloomFilter.getFilter() in the code
>> This http://hur.st/bloomfilter appears to give similar results.
>>
>>
>>
>
> Ahh, very helpful.  This indicates that 714MB would be used for the bloom
> filter for that one CF.
>
> JMX / cfstats reports "Bloom Filter Space Used" but the MBean method name
> (getBloomFilterDiskSpaceUsed) indicates this is the on-disk space. If
> on-disk and in-memory space used is similar then summing up all the "Bloom
> Filter Space Used" says they're currently consuming 1-2 GB of the heap
> which is substantial.
>
> If a CF is rarely read is it safe to set bloom_filter_fp_chance to 1.0?
>  It just means more trips to SSTable indexes for a read correct?  Trade RAM
> for time (disk I/O).
>
> -Bryan
>
>

Mime
View raw message