Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 58556 invoked from network); 16 Dec 2009 23:40:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Dec 2009 23:40:52 -0000 Received: (qmail 27179 invoked by uid 500); 16 Dec 2009 23:40:51 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 27166 invoked by uid 500); 16 Dec 2009 23:40:51 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 27157 invoked by uid 99); 16 Dec 2009 23:40:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Dec 2009 23:40:51 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bburruss@real.com designates 207.188.23.4 as permitted sender) Received: from [207.188.23.4] (HELO kal-el.real.com) (207.188.23.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Dec 2009 23:40:49 +0000 Received: from seacas02.corp.real.com ([::ffff:192.168.139.57]) (TLS: TLSv1/SSLv3,128bits,AES128-SHA) by kal-el.real.com with esmtp; Wed, 16 Dec 2009 15:40:29 -0800 id 0008000D.4B296FED.000064DB Received: from seambx.corp.real.com ([fe80::2d15:fda7:b3b8:e268]) by seacas02.corp.real.com ([::1]) with mapi; Wed, 16 Dec 2009 15:40:28 -0800 From: Brian Burruss To: "cassandra-user@incubator.apache.org" Date: Wed, 16 Dec 2009 15:39:12 -0800 Subject: RE: OOM Exception Thread-Topic: OOM Exception Thread-Index: Acp+p9DQlq4i1KIMRAe4wHzqmQZCJgAASfPc Message-ID: <766B5A29D28DA442AB229AAEE2AFC44507D7B914F9@SEAMBX.corp.real.com> References: <766B5A29D28DA442AB229AAEE2AFC44507D7B914F6@SEAMBX.corp.real.com> <766B5A29D28DA442AB229AAEE2AFC44507D7B914F8@SEAMBX.corp.real.com>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 is this what you want? they are big - i'd rather not spam everyone with th= em. if you need them or the hprof files i can tar them and send them to yo= u. thx! [bburruss@gen-app02 cassandra]$ ls -l ~/cassandra/btoddb/commitlog/ total 597228 -rw-rw-r-- 1 bburruss bburruss 134219796 Dec 16 13:52 CommitLog-12609958951= 23.log -rw-rw-r-- 1 bburruss bburruss 134218547 Dec 16 13:52 CommitLog-12609978113= 17.log -rw-rw-r-- 1 bburruss bburruss 134218331 Dec 16 13:52 CommitLog-12609984977= 44.log -rw-rw-r-- 1 bburruss bburruss 134219677 Dec 16 13:53 CommitLog-12610003305= 87.log -rw-rw-r-- 1 bburruss bburruss 74055680 Dec 16 14:49 CommitLog-12610004390= 79.log [bburruss@gen-app02 cassandra]$=20 ________________________________________ From: Jonathan Ellis [jbellis@gmail.com] Sent: Wednesday, December 16, 2009 3:29 PM To: cassandra-user@incubator.apache.org Subject: Re: OOM Exception How large are the log files being replayed? Can you attach the log from a replay attempt? On Wed, Dec 16, 2009 at 5:21 PM, Brian Burruss wrote: > sorry, thought i included everything ;) > > however, i am using beta2 > > ________________________________________ > From: Jonathan Ellis [jbellis@gmail.com] > Sent: Wednesday, December 16, 2009 3:18 PM > To: cassandra-user@incubator.apache.org > Subject: Re: OOM Exception > > What version are you using? 0.5 beta2 fixes the > using-more-memory-on-startup problem. > > On Wed, Dec 16, 2009 at 5:16 PM, Brian Burruss wrote: >> i'll put my question first: >> >> - how can i determine how much RAM is required by cassandra? (for norma= l operation and restarting server) >> >> *** i've attached my storage-conf.xml >> >> i've gotten several more OOM exceptions since i mentioned it a week or s= o ago. i started from a fresh database a couple days ago and have been add= ing 2k blocks of data keyed off a random integer at the rate of about 400/s= ec. i have a 2 node cluster, RF=3D2, Consistency for read/write is ONE. t= here are ~70,420,082 2k blocks of data in the database. >> >> i used the default memory setup of Xmx1G when i started a couple days ag= o. as the database grew to ~180G (reported by unix du command) both server= s OOM'ed at about the same time, within 10 minutes of each other. well nee= dless to say, my cluster is dead. so i upped the memory to 3G and the serv= ers tried to come back up, but one died again with OOM. >> >> Before cleaning the disk and starting over a couple days ago, i played t= he game of "jack up the RAM", but eventually i didn't want to up it anymore= when i got to 5G. the parameter, SSTable.INDEX_INTERVAL, was discussed a = few days ago that would change the number of "keys" cached in memory, so i = could modify that at the cost of read performance, but doing the math, 3G s= hould be plenty of room. >> >> it seems like startup requires more RAM than just normal running. >> >> so this of course concerns me. >> >> i have the hprof files from when the server initially crashed and when i= t crashed trying to restart if anyone wants them >> >