Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 28032 invoked from network); 12 Feb 2010 21:09:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Feb 2010 21:09:59 -0000 Received: (qmail 72861 invoked by uid 500); 12 Feb 2010 21:09:59 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 72835 invoked by uid 500); 12 Feb 2010 21:09:59 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 72826 invoked by uid 99); 12 Feb 2010 21:09:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Feb 2010 21:09:59 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates 209.85.219.212 as permitted sender) Received: from [209.85.219.212] (HELO mail-ew0-f212.google.com) (209.85.219.212) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Feb 2010 21:09:51 +0000 Received: by ewy4 with SMTP id 4so591ewy.27 for ; Fri, 12 Feb 2010 13:09:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=kmrqG0BhJOSp8qRdzJfSOTNCr408yml0WINVSBWeEn0=; b=ZBLQiyN93IwFNOPRj7hhazSohyQfRrPVuntABnRWpK1cXHXbyC2fINq9uMvhdYMZHj WcWKKP24++pgNAC9kOTHYAMeq95ApJwVAzUfpx7BjkiqvgUr41cAm69XI1ADV/ilY01q MXQkV4itAplakrJKWehmYMrHawTGY0S8ODvrc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=JKF13Gwt3YxLq+mE2DTD8C5n0m2NV4AVFYdZoJah3ObcqnR7pJksn6hCF6t/MLPugh x4AhQ70kbtxKtzDnl+iisxmiaPGo9Tt/l3gfXWB+o6Zlt1op5fly3kMK0jNh51rnekqZ 7s92EzEva/O0HBYFRE1I4HGlb+7nLWkSA6cno= MIME-Version: 1.0 Received: by 10.216.90.4 with SMTP id d4mr1135776wef.135.1266008970137; Fri, 12 Feb 2010 13:09:30 -0800 (PST) In-Reply-To: <20100212205246.GA16171@alumni.caltech.edu> References: <20100211181027.GA4022@alumni.caltech.edu> <20100212205246.GA16171@alumni.caltech.edu> From: Jonathan Ellis Date: Fri, 12 Feb 2010 15:09:10 -0600 Message-ID: Subject: Re: OOM on restart To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org 0.5 allows 1 + 2 * Runtime.getRuntime().availableProcessors() Memtables + 1 per DataFileLocation to be waiting for flush before it will block writes (or log replay) to give those time to flush out. So, it sounds like you just need to lower your Memtable max size/object count. On Fri, Feb 12, 2010 at 2:52 PM, Anthony Molinaro wrote: > 0.5.0 final. =A0I was able to get things going again by upping the memory > then lowering it after a successful restart, but I would like to know how > to minimize the chances of OOM via tuning. > > -Anthony > > On Thu, Feb 11, 2010 at 01:29:16PM -0600, Jonathan Ellis wrote: >> What version are you on these days? :) >> >> On Thu, Feb 11, 2010 at 12:11 PM, Anthony Molinaro >> wrote: >> > Hi, >> > >> > =A0I've been having nodes failing recently with OOM exceptions (not su= re >> > why, but we have had an increase in traffic so that could be a cause). >> > Most nodes have restarted fine, one node however, has been having prob= lems >> > restarting. =A0It was failing with >> > >> > java.lang.OutOfMemoryError: Java heap space >> > =A0 =A0 =A0 =A0at java.util.Arrays.copyOfRange(Arrays.java:3209) >> > =A0 =A0 =A0 =A0at java.lang.String.(String.java:216) >> > =A0 =A0 =A0 =A0at java.io.DataInputStream.readUTF(DataInputStream.java= :644) >> > =A0 =A0 =A0 =A0at java.io.DataInputStream.readUTF(DataInputStream.java= :547) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.db.ColumnFamilySerializer.deser= ialize(ColumnFamilySerializer.java:104) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.db.RowMutationSerializer.defree= zeTheMaps(RowMutation.java:308) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.db.RowMutationSerializer.deseri= alize(RowMutation.java:318) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.db.RowMutationSerializer.deseri= alize(RowMutation.java:271) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.db.CommitLog.recover(CommitLog.= java:338) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.db.RecoveryManager.doRecovery(R= ecoveryManager.java:65) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.service.CassandraDaemon.setup(C= assandraDaemon.java:90) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.service.CassandraDaemon.main(Ca= ssandraDaemon.java:166) >> > >> > And >> > >> > java.lang.OutOfMemoryError: Java heap space >> > =A0 =A0 =A0 =A0at java.lang.StringCoding.encode(StringCoding.java:266) >> > =A0 =A0 =A0 =A0at java.lang.StringCoding.encode(StringCoding.java:284) >> > =A0 =A0 =A0 =A0at java.lang.String.getBytes(String.java:987) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.utils.FBUtilities.hash(FBUtilit= ies.java:178) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.dht.RandomPartitioner.getToken(= RandomPartitioner.java:116) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.dht.RandomPartitioner.decorateK= ey(RandomPartitioner.java:44) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.db.Memtable.resolve(Memtable.ja= va:148) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.db.Memtable.put(Memtable.java:1= 43) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.db.ColumnFamilyStore.apply(Colu= mnFamilyStore.java:478) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.db.Table.apply(Table.java:445) >> > =A0 =A0 =A0 =A0at org.apache.cassandra.db.CommitLog$3.run(CommitLog.ja= va:365) >> > =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor$Worker.runTa= sk(ThreadPoolExecutor.java:886) >> > =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor$Worker.run(T= hreadPoolExecutor.java:908) >> > =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:619) >> > >> > I upped the Xmx value from 4G to 6G and it seems to be doing okay, how= ever >> > it seems odd that it can run mostly fine with 4G, but fail to restart = with >> > that much memory. =A0Maybe this ticket's issue is back? >> > >> > https://issues.apache.org/jira/browse/CASSANDRA-609 >> > >> > Anyway, I'm hoping thing will recover with 6G then I can restart again= with 4G and things will be good. >> > >> > I'd also like a better understanding of why cassandra might OOM in gen= eral. >> > Are there settings which minimize the chances of OOM? =A0This instance= has >> > 2 column families and I have >> > >> > =A0512 >> > =A01.0 >> > =A01440 >> > >> > So if I understand these settings, memtables can at most be 512MB in s= ize >> > or consist of 1 million objects before they are flushed to disk. =A0Th= e maximum >> > time before they will be flushed is 24 hours. =A0So does that mean if = I fill >> > up 8G or 16 memtables in less than 24 hours, I've basically used all t= he >> > memory available to me? =A0I assume there are other things using memor= y, >> > (indexes, etc), how is that limited? =A0Anyway, any information about = what >> > is used where would be appreciated. >> > >> > Thanks, >> > >> > -Anthony >> > >> > -- >> > ----------------------------------------------------------------------= -- >> > Anthony Molinaro =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <= anthonym@alumni.caltech.edu> >> > > > -- > ------------------------------------------------------------------------ > Anthony Molinaro =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 >