Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 19361 invoked from network); 2 Dec 2010 00:56:30 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Dec 2010 00:56:30 -0000 Received: (qmail 62500 invoked by uid 500); 2 Dec 2010 00:56:28 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 62473 invoked by uid 500); 2 Dec 2010 00:56:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 62465 invoked by uid 99); 2 Dec 2010 00:56:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 00:56:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a41.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 00:56:21 +0000 Received: from homiemail-a41.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a41.g.dreamhost.com (Postfix) with ESMTP id E3DEA44C06E for ; Wed, 1 Dec 2010 16:55:53 -0800 (PST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=to:from :subject:date:message-id:content-type:mime-version:in-reply-to; q=dns; s=thelastpickle.com; b=vUPYnJu8OQL7gZzV2lum5jJ3WgSgNwMd0 3N9m+cZpv9iIBdR0hJvdS91/VPYsDty0e4ZcX4zhliWGTcoV3exOvHvtXAd1lw+2 q1II/SrEH3sguqj0A/ena96EEtQ31714sCoBuwYkzy9wT8axhviKO5jdBnQQVxz2 ipnJVDqtLo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=to :from:subject:date:message-id:content-type:mime-version: in-reply-to; s=thelastpickle.com; bh=1BvItwnNLcET+7Y/9e8moAWBJWM =; b=jD8PV//gIwcddP9hlaivsQ5RHbFKQQ02wKxGhglrVbXWm5ndx2GdJVZJiGR cMotlY+hYbhBfLJw89WYetYd67lwikk4elgYWU/vj4Uweiag31optGyBxfjBcOgD ydMtguGW2FCXxcDej9fs8AHPDPyb958WSs9dCFX8VH+UD75w= Received: from localhost (webms.mac.com [17.148.16.116]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a41.g.dreamhost.com (Postfix) with ESMTPSA id CEC3944C06A for ; Wed, 1 Dec 2010 16:55:53 -0800 (PST) To: user@cassandra.apache.org From: Aaron Morton Subject: Re: OutOfMemory exceptions w/ Cassandra 0.6.8 Date: Thu, 02 Dec 2010 00:55:53 GMT X-Mailer: MobileMe Mail (1C3207) Message-id: Content-Type: multipart/alternative; boundary=Apple-Webmail-42--10bca0fb-493f-ef45-f6e3-9c4ad846b43e MIME-Version: 1.0 In-Reply-To: X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Webmail-42--10bca0fb-493f-ef45-f6e3-9c4ad846b43e Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252; format=flowed Do you have a log message for the OOM? And some GC messages around it? Hav= e you tried watching the server with jconsole?=0A=0AIs the OOM happening o= n system start or after it's been running ? Or both?=0A=0ADo you have any = row/key caches? Cannot remember but is 0.6* has this but have you enabled = the save cache feature?=0A=0AAaron=0A=A0=0AOn 02 Dec, 2010,at 01:28 PM, Ar= am Ayazyan wrote:=0A=0AHi,=0A=0AWe have a small cluste= r of 3 Cassandra servers running w/ full=0Areplication. Every once in a wh= ile we get an OutOfMemory exception and=0Ahave to restart servers. Sometim= es just restarting doesn=92t do it and=0Awe have to clean the commitlog or= data directory.=0A=0AWe are running Cassandra 0.6.8. There is only 1 keys= pace and 3 column=0Afamilies. There are less than 1000 keys across all col= umn families.=0AThere is roughly 1 write request per second and 1 read req= uest. Each=0Aserver is allocated 1GB. Size of all files in data directory = of the=0Aonly column family is ~300MB. MemtableThroughputInMB is throttled= way=0Adown to 2 and BinaryMemtableThroughputInMB to 8 (w/ higher values w= e=0Awere running out of memory extremely fast, this way it works for a=0Ac= ouple of days w/o crashing).=0A=0ALast time this issue happened, I didn=92= t clear the commitlog/data=0Afolders, enabled gc logging and restarted Cas= sandra. It crashes really=0Afast, but what is really strange is that it se= ems like it still has=0Aplenty of memory when the error happens, last 3 li= nes from gc log:=0A21.408: [GC 437098K->436592K(1046464K), 0.0986800 secs]= =0A21.520: [GC 453616K->453117K(1046464K), 0.0967770 secs]=0A21.629: [GC 4= 70141K->469436K(1046464K), 0.0383520 secs]=0AThe full log is here: http://= pastebin.com/XGRSRcBd=0A=0AI=92ve tried increasing the memory up to 1.5GB,= but it still doesn=92t start.=0A=0AAny ideas what might be the problem he= re?=0A=0AThank you,=0AAram=0A --Apple-Webmail-42--10bca0fb-493f-ef45-f6e3-9c4ad846b43e Content-Type: multipart/related; type="text/html"; boundary=Apple-Webmail-86--10bca0fb-493f-ef45-f6e3-9c4ad846b43e --Apple-Webmail-86--10bca0fb-493f-ef45-f6e3-9c4ad846b43e Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252;
Do you have a log message for the OOM? And some GC messages around it= ? Have you tried watching the server with jconsole?

Is the OOM happening on system start or after it's been running ? Or bo= th?

Do you have any row/key caches? Cannot rememb= er but is 0.6* has this but have you enabled the save cache feature?
=

Aaron
 
On 02 Dec, 2010,at 01:28 PM,= Aram Ayazyan <ayazyan@gmail.com> wrote:

Hi,
=0A
=0AWe have a s= mall cluster of 3 Cassandra servers running w/ full
=0Areplication. Eve= ry once in a while we get an OutOfMemory exception and
=0Ahave to resta= rt servers. Sometimes just restarting doesn=92t do it and
=0Awe have to= clean the commitlog or data directory.
=0A
=0AWe are running Cassan= dra 0.6.8. There is only 1 keyspace and 3 column
=0Afamilies. There are= less than 1000 keys across all column families.
=0AThere is roughly 1 = write request per second and 1 read request. Each
=0Aserver is allocate= d 1GB. Size of all files in data directory of the
=0Aonly column famil= y is ~300MB. MemtableThroughputInMB is throttled way
=0Adown to 2 and B= inaryMemtableThroughputInMB to 8 (w/ higher values we
=0Awere running o= ut of memory extremely fast, this way it works for a
=0Acouple of days = w/o crashing).
=0A
=0ALast time this issue happened, I didn=92t clea= r the commitlog/data
=0Afolders, enabled gc logging and restarted Cassa= ndra. It crashes really
=0Afast, but what is really strange is that it = seems like it still has
=0Aplenty of memory when the error happens, las= t 3 lines from gc log:
=0A21.408: [GC 437098K->436592K(1046464K), 0.= 0986800 secs]
=0A21.520: [GC 453616K->453117K(1046464K), 0.0967770 s= ecs]
=0A21.629: [GC 470141K->469436K(1046464K), 0.0383520 secs]
=0A= The full log is here: http://pastebin.com/XGRSRcBd
=0A
= =0AI=92ve tried increasing the memory up to 1.5GB, but it still doesn=92t = start.
=0A
=0AAny ideas what might be the problem here?
=0A
=0A= Thank you,
=0AAram
=0A
--Apple-Webmail-86--10bca0fb-493f-ef45-f6e3-9c4ad846b43e-- --Apple-Webmail-42--10bca0fb-493f-ef45-f6e3-9c4ad846b43e--