incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike <mthero...@yahoo.com>
Subject Re: Diagnosing memory issues
Date Tue, 11 Dec 2012 15:29:01 GMT
Thank you for the response.  Since the time of this question, we've 
identified a number of areas that needed improving and have helped 
things along quite a bit.  To answer your question, we were seeing both 
ParNew and CMS.  There were no errors in the log, and all the nodes have 
been up.

However, we are seeing one interesting issue.  We are running a 6 node 
cluster with a Replication Factor of 3.  The nodes are pretty evenly 
balanced.  All reads and writes to Cassandra uses LOCAL_QUORUM 
consistency.  We are seeing a very interesting problem from the JMX 
statistics.  We discovered we had one column family with an extremely 
high and unexpected write count.  The writes to this column family are 
done in conjunction with other writes to other column families such that 
their numbers should be roughly equivalent, but they are off by a factor 
of 10.   We have yet to find anything in our code that could cause this 
discrepancy in numbers.

What really interesting is that we see this behavior on only 5 of the 6 
nodes in our cluster.  On 5 of the 6 nodes, we see statistics indicating 
we are writing two fast and this specific memtable is exceeding its 
memtable 128M limit, while this one other node seems to be handling the 
load ok (Memtables stay within their limites).  Given our replication 
factor, I'm not sure how this is possible.

Any hints on what might be causing this additional load?  Are there 
other activities in Cassandra might account for this increased load on a 
single column family?

Any insights would be appreciated,
-Mike


On 12/4/2012 3:33 PM, aaron morton wrote:
> For background, a discussion on estimating working set 
> http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html . 
> You can also just look at the size of tenured heap after a CMS.
>
> Are you seeing lots of ParNew or CMS ?
>
> GC activity is a result of configuration *and* workload. Look in your 
> data model for wide rows, or long lived rows that get a lot of 
> deletes, and look in your code for large reads / writes (e.g. 
> sometimes we read 100,000 columns from a row).
>
>> The number that really jumps out at me below is the number of Pending 
>> requests for the Message Service.  24,000+ pending requests.
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,702 StatusLogger.java 
>> (line 89) MessagingService                n/a    24,229
> Technically speaking that ain't right.
> The whole server looks unhappy.
>
> Are there any errors in the logs ?
> Are all the nodes up ?
>
> A very blunt approach is to reduce the in_memory_compaction_limit and 
> the concurrent_compactors or compaction_throughput_mb_per_sec. This 
> reduces the impact compaction and repair have on the system and may 
> give you breathing space to look at other causes. Once you have a feel 
> for what's going on you can turn them up.
>
> Hope that helps.
> A
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/12/2012, at 7:04 AM, Mike <mtheroux2@yahoo.com 
> <mailto:mtheroux2@yahoo.com>> wrote:
>
>> Hello,
>>
>> Our Cassandra cluster has, relatively recently, started experiencing 
>> memory pressure that I am in the midsts of diagnosing.  Our system 
>> has uneven levels of traffic, relatively light during the day, but 
>> extremely heavy during some overnight processing.  We have started 
>> getting a message:
>>
>> WARN [ScheduledTasks:1] 2012-12-04 09:08:58,579 GCInspector.java 
>> (line 145) Heap is 0.7520105072262254 full.  You may need to reduce 
>> memtable and/or cache sizes.  Cassandra will now flush up to the two 
>> largest memtables to free up memory.  Adjust 
>> flush_largest_memtables_at threshold in cassandra.yaml if you don't 
>> want Cassandra to do this automatically
>>
>> I've started implementing some instrumentation to gather stats from 
>> JMX to determine what is happening.  However, last night, the 
>> GCInspector was kind enough to log the information below.  Couple of 
>> things jumped out at me.
>>
>> The maximum heap for the Cassandra is 4GB.  We are running Cassandra 
>> 1.1.2, on a 6 node cluster, with a replication factor of 3.  All our 
>> queries use LOCAL_QUORUM consistency.
>>
>> Adding up the caches + the memtable "data" in the trace below, comes 
>> to under 600MB
>>
>> The number that really jumps out at me below is the number of Pending 
>> requests for the Message Service.  24,000+ pending requests.
>>
>> Does this number represent the number of outstanding client requests 
>> that this node is processing?  If so, does this mean we potentially 
>> have 24,000 responses being pulled into memory, thereby causing this 
>> memory issue?  What else should I look at?
>>
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,585 StatusLogger.java 
>> (line 57) Pool Name                    Active   Pending   Blocked
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,695 StatusLogger.java 
>> (line 72) ReadStage                        32        66         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,696 StatusLogger.java 
>> (line 72) RequestResponseStage              0       193         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,696 StatusLogger.java 
>> (line 72) ReadRepairStage                   0         0         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,696 StatusLogger.java 
>> (line 72) MutationStage                     2         2         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,697 StatusLogger.java 
>> (line 72) ReplicateOnWriteStage             5         5         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,698 StatusLogger.java 
>> (line 72) GossipStage                       0        13         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,698 StatusLogger.java 
>> (line 72) AntiEntropyStage                  0         0         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,698 StatusLogger.java 
>> (line 72) MigrationStage                    0         0         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,699 StatusLogger.java 
>> (line 72) StreamStage                       0         0         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,699 StatusLogger.java 
>> (line 72) MemtablePostFlusher               0         0         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,699 StatusLogger.java 
>> (line 72) FlushWriter                       0         0         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,700 StatusLogger.java 
>> (line 72) MiscStage                         0         0         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,700 StatusLogger.java 
>> (line 72) commitlog_archiver                0         0         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,700 StatusLogger.java 
>> (line 72) InternalResponseStage             0         0         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,701 StatusLogger.java 
>> (line 72) AntiEntropySessions               0         0         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,701 StatusLogger.java 
>> (line 72) HintedHandoff                     0         0         0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,702 StatusLogger.java 
>> (line 77) CompactionManager                 2         4
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,702 StatusLogger.java 
>> (line 89) MessagingService                n/a    24,229
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,702 StatusLogger.java 
>> (line 99) Cache Type                     Size Capacity KeysToSave 
>> Provider
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,702 StatusLogger.java 
>> (line 100) KeyCache                    2184533 2184533 all
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,703 StatusLogger.java 
>> (line 106) RowCache                   52385581 52428800 
>>                      all 
>> org.apache.cassandra.cache.SerializingCacheProvider
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,703 StatusLogger.java 
>> (line 113) ColumnFamily                Memtable ops,data
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,703 StatusLogger.java 
>> (line 116) system.NodeIdInfo                         0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,704 StatusLogger.java 
>> (line 116) system.IndexInfo                          0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,705 StatusLogger.java 
>> (line 116) system.LocationInfo                       0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,705 StatusLogger.java 
>> (line 116) system.Versions                           0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,705 StatusLogger.java 
>> (line 116) system.schema_keyspaces                   0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,705 StatusLogger.java 
>> (line 116) system.Migrations                         0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,706 StatusLogger.java 
>> (line 116) system.schema_columnfamilies                 0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,706 StatusLogger.java 
>> (line 116) system.schema_columns                     0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,706 StatusLogger.java 
>> (line 116) system.HintsColumnFamily                  0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,706 StatusLogger.java 
>> (line 116) system.Schema                             0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,707 StatusLogger.java 
>> (line 116) open.comp                           0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,707 StatusLogger.java 
>> (line 116) open.bp                            0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,707 StatusLogger.java 
>> (line 116) open.bn              312832,47184787
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,707 StatusLogger.java 
>> (line 116) open.p                       711,193201
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,707 StatusLogger.java 
>> (line 116) open.bid        273064,46316018
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,708 StatusLogger.java 
>> (line 116) open.rel                       0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,708 StatusLogger.java 
>> (line 116) open.images                              0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,708 StatusLogger.java 
>> (line 116) open.users                    62287,86665510
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,708 StatusLogger.java 
>> (line 116) open.sessions                  4710,13153051
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,709 StatusLogger.java 
>> (line 116) open.userIndices                      4,1960
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,709 StatusLogger.java 
>> (line 116) open.caches                       50,4813457
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,709 StatusLogger.java 
>> (line 116) open.content                             0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,710 StatusLogger.java 
>> (line 116) open.enrich                    30,20793
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,744 StatusLogger.java 
>> (line 116) open.bt                    1133,776831
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,863 StatusLogger.java 
>> (line 116) open.alias                      253,163933
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,864 StatusLogger.java 
>> (line 116) open.bymsgid         249610,73075517
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,864 StatusLogger.java 
>> (line 116) open.rank                319956,70898417
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,865 StatusLogger.java 
>> (line 116) open.cmap                 448,406193
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,865 StatusLogger.java 
>> (line 116) open.pmap                    659,566220
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,865 StatusLogger.java 
>> (line 116) open.pict                 50944,58659596
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,878 StatusLogger.java 
>> (line 116) open.w                            0,0
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,879 StatusLogger.java 
>> (line 116) open.s                  92395,46160381
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,879 StatusLogger.java 
>> (line 116) open.bymrel       136607,57780555
>> INFO [ScheduledTasks:1] 2012-12-04 09:00:37,879 StatusLogger.java 
>> (line 116) open.m                 26720,51150067
>>
>> It's appreciated,
>> Thanks,
>> -Mike
>


Mime
View raw message