From cassandra-user-return-2495-apmail-incubator-cassandra-user-archive=incubator.apache.org@incubator.apache.org Thu Feb 11 19:26:22 2010 Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 64516 invoked from network); 11 Feb 2010 19:26:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Feb 2010 19:26:22 -0000 Received: (qmail 57979 invoked by uid 500); 11 Feb 2010 19:26:21 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 57946 invoked by uid 500); 11 Feb 2010 19:26:21 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 57937 invoked by uid 99); 11 Feb 2010 19:26:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Feb 2010 19:26:21 +0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [131.215.239.119] (HELO mail.alumni.caltech.edu) (131.215.239.119) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Feb 2010 19:26:09 +0000 Received: from localhost (dsl081-082-089.lax1.dsl.speakeasy.net [64.81.82.89]) by mail.alumni.caltech.edu (Postfix) with ESMTPSA id C53503F2890; Thu, 11 Feb 2010 10:11:39 -0800 (PST) X-DKIM: Sendmail DKIM Filter v2.8.2 mail.alumni.caltech.edu C53503F2890 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=alumni.caltech.edu; s=enforce; t=1265911900; bh=2c1+m/4jBhCwMO4JhbEANV5oHQvqr3KbpU3TymlFR3U=; h=Date:From:To:Subject:Message-ID:Mime-Version:Content-Type; b=U3goRQcgL2lhfCnGiIukywksqR/Rk882FPfvyL6JVQ58/mZXcljv2QTJfL0h4n8Z6 u1Dlj9Yy9liTa22aOG7+qqgqL7qUmidLAEVX2p5mKPfA9uw2CmsMBdGQD3jNoK/H1B OyXpHKBBjQgBkySMRaZhZLOLr2T7AEKdw3WHgnNw= Date: Thu, 11 Feb 2010 10:11:18 -0800 From: Anthony Molinaro To: cassandra-user@incubator.apache.org Subject: OOM on restart Message-ID: <20100211181027.GA4022@alumni.caltech.edu> Mail-Followup-To: cassandra-user@incubator.apache.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-MailScanner-Information-Alumni: X-Alumni-MailScanner-ID: C53503F2890.ADFC2 X-MailScanner-Alumni: No Virii found X-Spam-Status-Alumni: not spam, SpamAssassin (not cached, score=-4.397, required 5, autolearn=not spam, ALL_TRUSTED -1.80, BAYES_00 -2.60, DKIM_SIGNED 0.00, FH_DATE_PAST_20XX 0.00) X-MailScanner-From: anthonym@alumni.caltech.edu X-Virus-Checked: Checked by ClamAV on apache.org Hi, I've been having nodes failing recently with OOM exceptions (not sure why, but we have had an increase in traffic so that could be a cause). Most nodes have restarted fine, one node however, has been having problems restarting. It was failing with java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.(String.java:216) at java.io.DataInputStream.readUTF(DataInputStream.java:644) at java.io.DataInputStream.readUTF(DataInputStream.java:547) at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:104) at org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.java:308) at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318) at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:271) at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:338) at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:65) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:90) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:166) And java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding.encode(StringCoding.java:266) at java.lang.StringCoding.encode(StringCoding.java:284) at java.lang.String.getBytes(String.java:987) at org.apache.cassandra.utils.FBUtilities.hash(FBUtilities.java:178) at org.apache.cassandra.dht.RandomPartitioner.getToken(RandomPartitioner.java:116) at org.apache.cassandra.dht.RandomPartitioner.decorateKey(RandomPartitioner.java:44) at org.apache.cassandra.db.Memtable.resolve(Memtable.java:148) at org.apache.cassandra.db.Memtable.put(Memtable.java:143) at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:478) at org.apache.cassandra.db.Table.apply(Table.java:445) at org.apache.cassandra.db.CommitLog$3.run(CommitLog.java:365) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) I upped the Xmx value from 4G to 6G and it seems to be doing okay, however it seems odd that it can run mostly fine with 4G, but fail to restart with that much memory. Maybe this ticket's issue is back? https://issues.apache.org/jira/browse/CASSANDRA-609 Anyway, I'm hoping thing will recover with 6G then I can restart again with 4G and things will be good. I'd also like a better understanding of why cassandra might OOM in general. Are there settings which minimize the chances of OOM? This instance has 2 column families and I have 512 1.0 1440 So if I understand these settings, memtables can at most be 512MB in size or consist of 1 million objects before they are flushed to disk. The maximum time before they will be flushed is 24 hours. So does that mean if I fill up 8G or 16 memtables in less than 24 hours, I've basically used all the memory available to me? I assume there are other things using memory, (indexes, etc), how is that limited? Anyway, any information about what is used where would be appreciated. Thanks, -Anthony -- ------------------------------------------------------------------------ Anthony Molinaro