Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8415C17C8F for ; Mon, 9 Feb 2015 06:56:35 +0000 (UTC) Received: (qmail 62380 invoked by uid 500); 9 Feb 2015 06:56:35 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 62341 invoked by uid 500); 9 Feb 2015 06:56:35 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 62330 invoked by uid 99); 9 Feb 2015 06:56:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Feb 2015 06:56:35 +0000 Date: Mon, 9 Feb 2015 06:56:35 +0000 (UTC) From: "Brent Haines (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-8723) Cassandra 2.1.2 Memory issue - java process memory usage continuously increases until process is killed by OOM killer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8723?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 14311864#comment-14311864 ]=20 Brent Haines edited comment on CASSANDRA-8723 at 2/9/15 6:55 AM: ----------------------------------------------------------------- [~jeffl] I ran=20 {code} watch -n 10 'nodetool compactionstats' {code} on the effected node and watched it for awhile. For us it would always end = up on the same compaction, of the same CF where it would get stuck until th= e OOM happened. The stats on the compaction give you a hint -- the total nu= mber of bytes are the same each time, then it will get some portion of the = way through the compaction when progress freezes and eventually the system = runs OOM. We have the standard replication factor of 3 so it was no big deal to stop = cassandra, delete the node's storage of that CF and then restart and run re= pair. Care must be taken, obviously, but it did recover steady state for us= on 3 separate incidents. Once it's fixed on a node, we haven't had issues = return for that node. was (Author: thebrenthaines): [~jeffl] I ran=20 {code} watch -n 10 'nodetool compactionstats' {code} on the effected node and watch it for awhile. For us it would always end up= on the same compaction, of the same CF where it would get stuck until the = OOM happened. The stats on the compaction give you a hint -- the total numb= er of bytes are the same each time, then it will get some portion of the wa= y through the compaction when progress freezes and eventually the system ru= ns OOM. We have the standard replication factor of 3 so it was no big deal to stop = cassandra, delete the node's storage of that CF and then restart and run re= pair. Care must be taken, obviously, but it did recover steady state for us= on 3 separate incidents. Once it's fixed no a node, we haven't had issues = return for that node. > Cassandra 2.1.2 Memory issue - java process memory usage continuously inc= reases until process is killed by OOM killer > -------------------------------------------------------------------------= -------------------------------------------- > > Key: CASSANDRA-8723 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8723 > Project: Cassandra > Issue Type: Bug > Reporter: Jeff Liu > Fix For: 2.1.3 > > Attachments: cassandra.yaml > > > Issue: > We have an on-going issue with cassandra nodes running with continuously = increasing memory until killed by OOM. > {noformat} > Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783481] Out of memory: Ki= ll process 13919 (java) score 911 or sacrifice child > Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783557] Killed process 13= 919 (java) total-vm:18366340kB, anon-rss:6461472kB, file-rss:6684kB > {noformat} > System Profile: > cassandra version 2.1.2 > system: aws c1.xlarge instance with 8 cores, 7.1G memory. > cassandra jvm: > -Xms1792M -Xmx1792M -Xmn400M -Xss256k > {noformat} > java -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.8.jar -XX:+UseThrea= dPriorities -XX:ThreadPriorityPolicy=3D42 -Xms1792M -Xmx1792M -Xmn400M -XX:= +HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=3D1000003 -XX:+Use= ParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:Survivor= Ratio=3D8 -XX:MaxTenuringThreshold=3D1 -XX:CMSInitiatingOccupancyFraction= =3D75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+CMSClassUnloadin= gEnabled -XX:+UseCondCardMark -XX:+PrintGCDetails -XX:+PrintGCDateStamps -X= X:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStop= pedTime -XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc-1421511249= .log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=3D5 -XX:GCLogFileSize= =3D48M -Djava.net.preferIPv4Stack=3Dtrue -Dcom.sun.management.jmxremote.por= t=3D7199 -Dcom.sun.management.jmxremote.ssl=3Dfalse -Dcom.sun.management.jm= xremote.authenticate=3Dfalse -javaagent:/usr/share/java/graphite-reporter-a= gent-1.0-SNAPSHOT.jar=3DgraphiteServer=3Dmetrics-a.hq.nest.com;graphitePort= =3D2003;graphitePollInt=3D60 -Dlogback.configurationFile=3Dlogback.xml -Dca= ssandra.logdir=3D/var/log/cassandra -Dcassandra.storagedir=3D -Dcassandra-p= idfile=3D/var/run/cassandra/cassandra.pid -cp /etc/cassandra:/usr/share/cas= sandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar= :/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/comm= ons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share= /cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-= 0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr= /share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-16.= 0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandr= a/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-as= l-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.8.jar:/usr/share/cassandra/li= b/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cas= sandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna-4.0.0.jar:/usr/share/= cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.= jar:/usr/share/cassandra/lib/logback-classic-1.1.2.jar:/usr/share/cassandra= /lib/logback-core-1.1.2.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/sha= re/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/metrics-gr= aphite-2.2.0.jar:/usr/share/cassandra/lib/mx4j-tools.jar:/usr/share/cassand= ra/lib/netty-all-4.0.23.Final.jar:/usr/share/cassandra/lib/reporter-config-= 2.1.0.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra= /lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.2.jar:/u= sr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/stringtemp= late-4.0.2.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cass= andra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.1= .2.jar:/usr/share/cassandra/apache-cassandra-thrift-2.1.2.jar:/usr/share/ca= ssandra/apache-cassandra.jar:/usr/share/cassandra/cassandra-driver-core-2.0= .5.jar:/usr/share/cassandra/netty-3.9.0.Final.jar:/usr/share/cassandra/stre= ss.jar: -XX:HeapDumpPath=3D/var/lib/cassandra/java_1421511248.hprof -XX:Err= orFile=3D/var/lib/cassandra/hs_err_1421511248.log org.apache.cassandra.serv= ice.CassandraDaemon > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)