Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4C03C6049 for ; Wed, 22 Jun 2011 12:42:35 +0000 (UTC) Received: (qmail 98581 invoked by uid 500); 22 Jun 2011 12:42:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 98562 invoked by uid 500); 22 Jun 2011 12:42:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 98553 invoked by uid 99); 22 Jun 2011 12:42:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2011 12:42:32 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sdolgy@gmail.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vx0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2011 12:42:26 +0000 Received: by vxi40 with SMTP id 40so767119vxi.31 for ; Wed, 22 Jun 2011 05:42:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=+OkcRk0i8n8lPFWj5rU5ds9Q+zfvGmUwlAOMHAcrR+I=; b=tDBBkdWWAQ1C6JnhOdekae1AvmVWA2XPiSRycbLlDjV5doC5Czctge2woBWGL8Mjai vrgRiX2ywJ3B0GrNRP9E7NeP6r9KkvJ3DciQBg1HlPovKLUT4BscgZ3R4Ec4aKBfIjh8 P0OUu/En4FCt3FUtnrFfLk7SatPMNCMSt5Fyw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=IOia1iDe9GhVwYiPYy28jvZuBMOnRqWCT+KXHT6GdBXg8yP/l3TBEKIhjHgZHpiH9n nGAQHUncR8Wh+tfRnS1su15ObO7G/15bbVEQuCc8Tx4lg5SIJnelRcGxh69niSQyMuuY lI7vV4+p8pj3yzdJcrz+W95Qv6Hj6L7jXQOEE= Received: by 10.52.65.231 with SMTP id a7mr932564vdt.61.1308746525043; Wed, 22 Jun 2011 05:42:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.162.69 with HTTP; Wed, 22 Jun 2011 05:41:45 -0700 (PDT) In-Reply-To: References: From: Sasha Dolgy Date: Wed, 22 Jun 2011 14:41:45 +0200 Message-ID: Subject: Re: OOM (or, what settings to use on AWS large?) To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org We had a similar problem a last month and found that the OS eventually in the end killed the Cassandra process on each of our nodes ... I've upgraded to 0.8.0 from 0.7.6-2 and have not had the problem since, but i do see consumption levels rising consistently from one day to the next on each node .. On Wed, Jun 1, 2011 at 2:30 PM, Sasha Dolgy wrote: > is there a specific string I should be looking for in the logs that > isn't super obvious to me at the moment... > > On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis wrote= : >> The place to start is with the statistics Cassandra logs after each GC. look for GCInspector I found this in the logs on all my servers but never did much after that...= . On Wed, Jun 22, 2011 at 2:33 PM, William Oberman wrote: > I woke up this morning to all 4 of 4 of my cassandra instances reporting > they were down in my cluster. =A0I quickly started them all, and everythi= ng > seems fine. =A0I'm doing a=A0postmortem=A0now, but it appears they all OO= M'd at > roughly the same time, which was not reported in any cassandra log, but I > discovered something in /var/log/kern that showed java died of oom(*). = =A0In > amazon, I'm using large instances for cassandra, and they have no swap (a= s > recommended), so I have ~8GB of ram. =A0Should I use a different max mem > setting? =A0I'm using a stock rpm from riptano/datastax. =A0If I run "ps = -aux" I > get: > /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=3D42 > -Xms3843M -Xmx3843M -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=3D8 -XX:MaxTenuringThreshold=3D1 > -XX:CMSInitiatingOccupancyFraction=3D75 -XX:+UseCMSInitiatingOccupancyOnl= y > -Djava.net.preferIPv4Stack=3Dtrue -Djava.rmi.server.hostname=3DX.X.X.X > -Dcom.sun.management.jmxremote.port=3D8080 > -Dcom.sun.management.jmxremote.ssl=3Dfalse > -Dcom.sun.management.jmxremote.authenticate=3Dfalse -Dmx4jaddress=3D0.0.0= .0 > -Dmx4jport=3D8081 -Dlog4j.configuration=3Dlog4j-server.properties > -Dlog4j.defaultInitOverride=3Dtrue > -Dcassandra-pidfile=3D/var/run/cassandra/cassandra.pid -cp > :/etc/cassandra/conf:/usr/share/cassandra/lib/antlr-3.1.3.jar:/usr/share/= cassandra/lib/apache-cassandra-0.7.4.jar:/usr/share/cassandra/lib/avro-1.4.= 0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/shar= e/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-= 1.2.jar:/usr/share/cassandra/lib/commons-collections-3.2.1.jar:/usr/share/c= assandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/concurrentlinked= hashmap-lru-1.1.jar:/usr/share/cassandra/lib/guava-r05.jar:/usr/share/cassa= ndra/lib/high-scale-lib.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0= .jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassa= ndra/lib/jetty-6.1.21.jar:/usr/share/cassandra/lib/jetty-util-6.1.21.jar:/u= sr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/json-simpl= e-1.1.jar:/usr/share/cassandra/lib/jug-2.0.0.jar:/usr/share/cassandra/lib/l= ibthrift-0.5.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassa= ndra/lib/mx4j-tools.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.j= ar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/sl= f4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar > org.apache.cassandra.thrift.CassandraDaemon > (*) Also, why would they all OOM so close to each other? =A0Bad luck? =A0= Or once > the first node went down, is there an increased chance of the rest? > I'm still on 0.7.4, when I released cassandra to production that was the > latest release. =A0In addition to (or instead of?) fixing memory settings= , I'm > guessing I should upgrade. > will