From user-return-10897-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Dec 02 01:07:54 2010 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 30150 invoked from network); 2 Dec 2010 01:07:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Dec 2010 01:07:54 -0000 Received: (qmail 70537 invoked by uid 500); 2 Dec 2010 01:07:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 70516 invoked by uid 500); 2 Dec 2010 01:07:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 70508 invoked by uid 99); 2 Dec 2010 01:07:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 01:07:50 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ayazyan@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 01:07:44 +0000 Received: by yxt33 with SMTP id 33so189683yxt.31 for ; Wed, 01 Dec 2010 17:07:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=O1uCHeQ1XkJKhVbebGfvlvzMbS6UJSSj4TdMmZYKZs8=; b=oX/AHTZaW6ZDnwyV/rW5doMI3gD0xpBrw+AsTHsCgFGX2TOWuL+8poLhE9cGXPsibD at/MQE7rH8mpqawOTlnmJqSGMNEVrXnffHT+Zl/nmqe1uV48OmeDY+QUYcAWHkgzFCuC 8Ye9kqecIFvBmOd+YtsNRevwxYuapAgagLDhc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=Toe8tnffhyuI2uVQA9xQJmxNPORySanGpwBb96i1Mj5rdRdDSbOeUW26uOcWG9oYlA TCUiYLbOh4zaclkXynKotc04pGtVk0cNWY9VtNM7dUjG/C8CeqiKrlXjZEDz3/xmupWr MawUlubvHejbhpjDwHwBP4C73Ix2c3aWSjdH4= Received: by 10.42.115.2 with SMTP id i2mr2729105icq.112.1291252042977; Wed, 01 Dec 2010 17:07:22 -0800 (PST) MIME-Version: 1.0 Received: by 10.42.178.69 with HTTP; Wed, 1 Dec 2010 17:07:02 -0800 (PST) In-Reply-To: References: From: Aram Ayazyan Date: Wed, 1 Dec 2010 17:07:02 -0800 Message-ID: Subject: Re: OutOfMemory exceptions w/ Cassandra 0.6.8 To: user@cassandra.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Regarding caches, I haven't explicitly enabled them and the "saved_caches" directory is empty. -Aram On Wed, Dec 1, 2010 at 5:05 PM, Aram Ayazyan wrote: > Hi Aaron, > > OOM is happening both after the system has been running for a while as > well as when I restart it afterwards. The only way to make it run > after it has crashed, is to remove everything from data and commitlog > directories. Unfortunately I don't have the original log from when > cassandra crashed earlier, but might have some soon if another node > crashes. > > This particular exception happened during start-up: > ERROR [main] 2010-12-01 14:58:37,795 CassandraDaemon.java (line 242) > Exception encountered during startup. > java.lang.OutOfMemoryError: unable to create new native thread > =A0 =A0 =A0 =A0at java.lang.Thread.start0(Native Method) > =A0 =A0 =A0 =A0at java.lang.Thread.start(Thread.java:597) > =A0 =A0 =A0 =A0at org.apache.cassandra.db.commitlog.PeriodicCommitLogExec= utorService.(PeriodicCommitLogExecutorService.java:57) > =A0 =A0 =A0 =A0at org.apache.cassandra.db.commitlog.PeriodicCommitLogExec= utorService.(PeriodicCommitLogExecutorService.java:40) > =A0 =A0 =A0 =A0at org.apache.cassandra.db.commitlog.CommitLog.(Comm= itLog.java:117) > =A0 =A0 =A0 =A0at org.apache.cassandra.db.commitlog.CommitLog.(Comm= itLog.java:71) > =A0 =A0 =A0 =A0at org.apache.cassandra.db.commitlog.CommitLog$CLHandle.(CommitLog.java:85) > =A0 =A0 =A0 =A0at org.apache.cassandra.db.commitlog.CommitLog.instance(Co= mmitLog.java:80) > =A0 =A0 =A0 =A0at org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMe= mtable(ColumnFamilyStore.java:469) > =A0 =A0 =A0 =A0at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(Co= lumnFamilyStore.java:517) > =A0 =A0 =A0 =A0at org.apache.cassandra.db.Table.flush(Table.java:431) > =A0 =A0 =A0 =A0at org.apache.cassandra.db.commitlog.CommitLog.recover(Com= mitLog.java:291) > =A0 =A0 =A0 =A0at org.apache.cassandra.db.commitlog.CommitLog.recover(Com= mitLog.java:172) > =A0 =A0 =A0 =A0at org.apache.cassandra.thrift.CassandraDaemon.setup(Cassa= ndraDaemon.java:115) > =A0 =A0 =A0 =A0at org.apache.cassandra.thrift.CassandraDaemon.main(Cassan= draDaemon.java:224) > > And here is the full GC log: http://pastebin.com/XGRSRcBd (all 21 > seconds of it). > > Thank you, > Aram > > On Wed, Dec 1, 2010 at 4:55 PM, Aaron Morton wr= ote: >> Do you have a log message for the OOM? And some GC messages around it? H= ave >> you tried watching the server with jconsole? >> Is the OOM happening on system start or after it's been running ? Or bot= h? >> Do you have any row/key caches? Cannot remember but is 0.6* has this but >> have you enabled the save cache feature? >> Aaron >> >> On 02 Dec, 2010,at 01:28 PM, Aram Ayazyan wrote: >> >> Hi, >> >> We have a small cluster of 3 Cassandra servers running w/ full >> replication. Every once in a while we get an OutOfMemory exception and >> have to restart servers. Sometimes just restarting doesn=92t do it and >> we have to clean the commitlog or data directory. >> >> We are running Cassandra 0.6.8. There is only 1 keyspace and 3 column >> families. There are less than 1000 keys across all column families. >> There is roughly 1 write request per second and 1 read request. Each >> server is allocated 1GB. Size of all files in data directory of the >> only column family is ~300MB. MemtableThroughputInMB is throttled way >> down to 2 and BinaryMemtableThroughputInMB to 8 (w/ higher values we >> were running out of memory extremely fast, this way it works for a >> couple of days w/o crashing). >> >> Last time this issue happened, I didn=92t clear the commitlog/data >> folders, enabled gc logging and restarted Cassandra. It crashes really >> fast, but what is really strange is that it seems like it still has >> plenty of memory when the error happens, last 3 lines from gc log: >> 21.408: [GC 437098K->436592K(1046464K), 0.0986800 secs] >> 21.520: [GC 453616K->453117K(1046464K), 0.0967770 secs] >> 21.629: [GC 470141K->469436K(1046464K), 0.0383520 secs] >> The full log is here: http://pastebin.com/XGRSRcBd >> >> I=92ve tried increasing the memory up to 1.5GB, but it still doesn=92t s= tart. >> >> Any ideas what might be the problem here? >> >> Thank you, >> Aram >> >