Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Mon, 9 Feb 2015 06:56:35 +0000 (UTC)
From: "Brent Haines (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12771883.1422921005000.298490.1423464995270@Atlassian.JIRA>
In-Reply-To: <JIRA.12771883.1422921005000@Atlassian.JIRA>
References: <JIRA.12771883.1422921005000@Atlassian.JIRA>
 <JIRA.12771883.1422921005579@arcas>
Subject: [jira] [Comment Edited] (CASSANDRA-8723) Cassandra 2.1.2 Memory
 issue - java process memory usage continuously increases until process is
 killed by OOM killer
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/CASSANDRA-8723?page=3Dcom.atlas=
sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D=
14311864#comment-14311864 ]=20

Brent Haines edited comment on CASSANDRA-8723 at 2/9/15 6:55 AM:
-----------------------------------------------------------------

[~jeffl] I ran=20

{code} watch -n 10 'nodetool compactionstats' {code}

on the effected node and watched it for awhile. For us it would always end =
up on the same compaction, of the same CF where it would get stuck until th=
e OOM happened. The stats on the compaction give you a hint -- the total nu=
mber of bytes are the same each time, then it will get some portion of the =
way through the compaction when progress freezes and eventually the system =
runs OOM.

We have the standard replication factor of 3 so it was no big deal to stop =
cassandra, delete the node's storage of that CF and then restart and run re=
pair. Care must be taken, obviously, but it did recover steady state for us=
 on 3 separate incidents. Once it's fixed on a node, we haven't had issues =
return for that node.


was (Author: thebrenthaines):
[~jeffl] I ran=20

{code} watch -n 10 'nodetool compactionstats' {code}

on the effected node and watch it for awhile. For us it would always end up=
 on the same compaction, of the same CF where it would get stuck until the =
OOM happened. The stats on the compaction give you a hint -- the total numb=
er of bytes are the same each time, then it will get some portion of the wa=
y through the compaction when progress freezes and eventually the system ru=
ns OOM.

We have the standard replication factor of 3 so it was no big deal to stop =
cassandra, delete the node's storage of that CF and then restart and run re=
pair. Care must be taken, obviously, but it did recover steady state for us=
 on 3 separate incidents. Once it's fixed no a node, we haven't had issues =
return for that node.


> Cassandra 2.1.2 Memory issue - java process memory usage continuously inc=
reases until process is killed by OOM killer
> -------------------------------------------------------------------------=
--------------------------------------------
>
>                 Key: CASSANDRA-8723
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8723
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeff Liu
>             Fix For: 2.1.3
>
>         Attachments: cassandra.yaml
>
>
> Issue:
> We have an on-going issue with cassandra nodes running with continuously =
increasing memory until killed by OOM.
> {noformat}
> Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783481] Out of memory: Ki=
ll process 13919 (java) score 911 or sacrifice child
> Jan 29 10:15:41 cass-chisel19 kernel: [24533109.783557] Killed process 13=
919 (java) total-vm:18366340kB, anon-rss:6461472kB, file-rss:6684kB
> {noformat}
> System Profile:
> cassandra version 2.1.2
> system: aws c1.xlarge instance with 8 cores, 7.1G memory.
> cassandra jvm:
> -Xms1792M -Xmx1792M -Xmn400M -Xss256k
> {noformat}
> java -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.8.jar -XX:+UseThrea=
dPriorities -XX:ThreadPriorityPolicy=3D42 -Xms1792M -Xmx1792M -Xmn400M -XX:=
+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=3D1000003 -XX:+Use=
ParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:Survivor=
Ratio=3D8 -XX:MaxTenuringThreshold=3D1 -XX:CMSInitiatingOccupancyFraction=
=3D75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+CMSClassUnloadin=
gEnabled -XX:+UseCondCardMark -XX:+PrintGCDetails -XX:+PrintGCDateStamps -X=
X:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStop=
pedTime -XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc-1421511249=
.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=3D5 -XX:GCLogFileSize=
=3D48M -Djava.net.preferIPv4Stack=3Dtrue -Dcom.sun.management.jmxremote.por=
t=3D7199 -Dcom.sun.management.jmxremote.ssl=3Dfalse -Dcom.sun.management.jm=
xremote.authenticate=3Dfalse -javaagent:/usr/share/java/graphite-reporter-a=
gent-1.0-SNAPSHOT.jar=3DgraphiteServer=3Dmetrics-a.hq.nest.com;graphitePort=
=3D2003;graphitePollInt=3D60 -Dlogback.configurationFile=3Dlogback.xml -Dca=
ssandra.logdir=3D/var/log/cassandra -Dcassandra.storagedir=3D -Dcassandra-p=
idfile=3D/var/run/cassandra/cassandra.pid -cp /etc/cassandra:/usr/share/cas=
sandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-runtime-3.5.2.jar=
:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/comm=
ons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share=
/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-=
0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr=
/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-16.=
0.jar:/usr/share/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/cassandr=
a/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-as=
l-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.8.jar:/usr/share/cassandra/li=
b/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cas=
sandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna-4.0.0.jar:/usr/share/=
cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.=
jar:/usr/share/cassandra/lib/logback-classic-1.1.2.jar:/usr/share/cassandra=
/lib/logback-core-1.1.2.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/sha=
re/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/metrics-gr=
aphite-2.2.0.jar:/usr/share/cassandra/lib/mx4j-tools.jar:/usr/share/cassand=
ra/lib/netty-all-4.0.23.Final.jar:/usr/share/cassandra/lib/reporter-config-=
2.1.0.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra=
/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.2.jar:/u=
sr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/stringtemp=
late-4.0.2.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cass=
andra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.1=
.2.jar:/usr/share/cassandra/apache-cassandra-thrift-2.1.2.jar:/usr/share/ca=
ssandra/apache-cassandra.jar:/usr/share/cassandra/cassandra-driver-core-2.0=
.5.jar:/usr/share/cassandra/netty-3.9.0.Final.jar:/usr/share/cassandra/stre=
ss.jar: -XX:HeapDumpPath=3D/var/lib/cassandra/java_1421511248.hprof -XX:Err=
orFile=3D/var/lib/cassandra/hs_err_1421511248.log org.apache.cassandra.serv=
ice.CassandraDaemon
> {noformat}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)