cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <bur...@spinn3r.com>
Subject Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.
Date Wed, 03 Aug 2016 15:44:41 GMT
DuyHai.  Yes.  We're generally happy with our disk throughput.  We're on
all SSD and have about 60 boxes.  The amount of data written isn't THAT
much.  Maybe 5GB max... but its over 60 boxes.



On Wed, Aug 3, 2016 at 3:49 AM, DuyHai Doan <doanduyhai@gmail.com> wrote:

> On a side node, do you monitor your disk I/O to see whether the disk
> bandwidth can catch up with the huge spikes in write ? Use dstat during the
> insert storm to see if you have big values for CPU wait
>
> On Wed, Aug 3, 2016 at 12:41 PM, Ben Slater <ben.slater@instaclustr.com>
> wrote:
>
>> Yes, looks like you have a (at least one) 100MB partition which is big
>> enough to cause issues. When you do lots of writes to the large partition
>> it is likely to end up getting compacted (as per the log) and compactions
>> often use a lot of memory / cause a lot of GC when they hit large
>> partitions. This, in addition to the write load is probably pushing you
>> over the edge.
>>
>> There are some improvements in 3.6 that might help (
>> https://issues.apache.org/jira/browse/CASSANDRA-11206) but the 2.2 to
>> 3.x upgrade path seems risky at best at the moment. In any event, your best
>> solution would be to find a way to make your partitions smaller (like
>> 1/10th of the size).
>>
>> Cheers
>> Ben
>> <https://issues.apache.org/jira/browse/CASSANDRA-11206>
>>
>> On Wed, 3 Aug 2016 at 12:35 Kevin Burton <burton@spinn3r.com> wrote:
>>
>>> I have a theory as to what I think is happening here.
>>>
>>> There is a correlation between the massive content all at once, and our
>>> outags.
>>>
>>> Our scheme uses large buckets of content where we write to a
>>> bucket/partition for 5 minutes, then move to a new one.  This way we can
>>> page through buckets.
>>>
>>> I think what's happening is that CS is reading the entire partition into
>>> memory, then slicing through it... which would explain why its running out
>>> of memory.
>>>
>>> system.log:WARN  [CompactionExecutor:294] 2016-08-03 02:01:55,659
>>> BigTableWriter.java:184 - Writing large partition
>>> blogindex/content_legacy_2016_08_02:1470154500099 (106107128 bytes)
>>>
>>> On Tue, Aug 2, 2016 at 6:43 PM, Kevin Burton <burton@spinn3r.com> wrote:
>>>
>>>> We have a 60 node CS cluster running 2.2.7 and about 20GB of RAM
>>>> allocated to each C* node.  We're aware of the recommended 8GB limit to
>>>> keep GCs low but our memory has been creeping up (probably) related to this
>>>> bug.
>>>>
>>>> Here's what we're seeing... if we do a low level of writes we think
>>>> everything generally looks good.
>>>>
>>>> What happens is that we then need to catch up and then do a TON of
>>>> writes all in a small time window.  Then CS nodes start dropping like
>>>> flies.  Some of them just GC frequently and are able to recover. When they
>>>> GC like this we see GC pause in the 30 second range which then cause them
>>>> to not gossip for a while and they drop out of the cluster.
>>>>
>>>> This happens as a flurry around the cluster so we're not always able to
>>>> catch which ones are doing it as they recover. However, if we have 3 down,
>>>> we mostly have a locked up cluster.  Writes don't complete and our app
>>>> essentially locks up.
>>>>
>>>> SOME of the boxes never recover. I'm in this state now.  We have t3-5
>>>> nodes that are in GC storms which they won't recover from.
>>>>
>>>> I reconfigured the GC settings to enable jstat.
>>>>
>>>> I was able to catch it while it was happening:
>>>>
>>>> ^Croot@util0067 ~ # sudo -u cassandra jstat -gcutil 4235 2500
>>>>   S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT
>>>>     GCT
>>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471
>>>> 1139.142 2825.332
>>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471
>>>> 1139.142 2825.332
>>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471
>>>> 1139.142 2825.332
>>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471
>>>> 1139.142 2825.332
>>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471
>>>> 1139.142 2825.332
>>>>   0.00 100.00 100.00  94.76  97.60  93.06  10435 1686.191   471
>>>> 1139.142 2825.332
>>>>
>>>> ... as you can see the box is legitimately out of memory.  S0, S1, E
>>>> and O are all completely full.
>>>>
>>>> I'm not sure were to go from here.  I think 20GB for our work load is
>>>> more than reasonable.
>>>>
>>>> 90% of the time they're well below 10GB of RAM used.  While I was
>>>> watching this box I was seeing 30% RAM used until it decided to climb to
>>>> 100%
>>>>
>>>> Any advice on what do do next... I don't see anything obvious in the
>>>> logs to signal a problem.
>>>>
>>>> I attached all the command line arguments we use.  Note that I think
>>>> that the cassandra-env.sh script puts them in there twice.
>>>>
>>>> -ea
>>>> -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar
>>>> -XX:+CMSClassUnloadingEnabled
>>>> -XX:+UseThreadPriorities
>>>> -XX:ThreadPriorityPolicy=42
>>>> -Xms20000M
>>>> -Xmx20000M
>>>> -Xmn4096M
>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>> -Xss256k
>>>> -XX:StringTableSize=1000003
>>>> -XX:+UseParNewGC
>>>> -XX:+UseConcMarkSweepGC
>>>> -XX:+CMSParallelRemarkEnabled
>>>> -XX:SurvivorRatio=8
>>>> -XX:MaxTenuringThreshold=1
>>>> -XX:CMSInitiatingOccupancyFraction=75
>>>> -XX:+UseCMSInitiatingOccupancyOnly
>>>> -XX:+UseTLAB
>>>> -XX:CompileCommandFile=/hotspot_compiler
>>>> -XX:CMSWaitDuration=10000
>>>> -XX:+CMSParallelInitialMarkEnabled
>>>> -XX:+CMSEdenChunksRecordAlways
>>>> -XX:CMSWaitDuration=10000
>>>> -XX:+UseCondCardMark
>>>> -XX:+PrintGCDetails
>>>> -XX:+PrintGCDateStamps
>>>> -XX:+PrintHeapAtGC
>>>> -XX:+PrintTenuringDistribution
>>>> -XX:+PrintGCApplicationStoppedTime
>>>> -XX:+PrintPromotionFailure
>>>> -XX:PrintFLSStatistics=1
>>>> -Xloggc:/var/log/cassandra/gc.log
>>>> -XX:+UseGCLogFileRotation
>>>> -XX:NumberOfGCLogFiles=10
>>>> -XX:GCLogFileSize=10M
>>>> -Djava.net.preferIPv4Stack=true
>>>> -Dcom.sun.management.jmxremote.port=7199
>>>> -Dcom.sun.management.jmxremote.rmi.port=7199
>>>> -Dcom.sun.management.jmxremote.ssl=false
>>>> -Dcom.sun.management.jmxremote.authenticate=false
>>>> -Djava.library.path=/usr/share/cassandra/lib/sigar-bin
>>>> -XX:+UnlockCommercialFeatures
>>>> -XX:+FlightRecorder
>>>> -ea
>>>> -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar
>>>> -XX:+CMSClassUnloadingEnabled
>>>> -XX:+UseThreadPriorities
>>>> -XX:ThreadPriorityPolicy=42
>>>> -Xms20000M
>>>> -Xmx20000M
>>>> -Xmn4096M
>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>> -Xss256k
>>>> -XX:StringTableSize=1000003
>>>> -XX:+UseParNewGC
>>>> -XX:+UseConcMarkSweepGC
>>>> -XX:+CMSParallelRemarkEnabled
>>>> -XX:SurvivorRatio=8
>>>> -XX:MaxTenuringThreshold=1
>>>> -XX:CMSInitiatingOccupancyFraction=75
>>>> -XX:+UseCMSInitiatingOccupancyOnly
>>>> -XX:+UseTLAB
>>>> -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler
>>>> -XX:CMSWaitDuration=10000
>>>> -XX:+CMSParallelInitialMarkEnabled
>>>> -XX:+CMSEdenChunksRecordAlways
>>>> -XX:CMSWaitDuration=10000
>>>> -XX:+UseCondCardMark
>>>> -XX:+PrintGCDetails
>>>> -XX:+PrintGCDateStamps
>>>> -XX:+PrintHeapAtGC
>>>> -XX:+PrintTenuringDistribution
>>>> -XX:+PrintGCApplicationStoppedTime
>>>> -XX:+PrintPromotionFailure
>>>> -XX:PrintFLSStatistics=1
>>>> -Xloggc:/var/log/cassandra/gc.log
>>>> -XX:+UseGCLogFileRotation
>>>> -XX:NumberOfGCLogFiles=10
>>>> -XX:GCLogFileSize=10M
>>>> -Djava.net.preferIPv4Stack=true
>>>> -Dcom.sun.management.jmxremote.port=7199
>>>> -Dcom.sun.management.jmxremote.rmi.port=7199
>>>> -Dcom.sun.management.jmxremote.ssl=false
>>>> -Dcom.sun.management.jmxremote.authenticate=false
>>>> -Djava.library.path=/usr/share/cassandra/lib/sigar-bin
>>>> -XX:+UnlockCommercialFeatures
>>>> -XX:+FlightRecorder
>>>> -Dlogback.configurationFile=logback.xml
>>>> -Dcassandra.logdir=/var/log/cassandra
>>>> -Dcassandra.storagedir=
>>>> -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid
>>>>
>>>>
>>>> --
>>>>
>>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>>> Engineers!
>>>>
>>>> Founder/CEO Spinn3r.com
>>>> Location: *San Francisco, CA*
>>>> blog: http://burtonator.wordpress.com
>>>> … or check out my Google+ profile
>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>> Engineers!
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>>
>>> --
>> ————————
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798
>>
>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Mime
View raw message