Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 54735 invoked from network); 20 Feb 2010 03:41:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Feb 2010 03:41:18 -0000 Received: (qmail 25371 invoked by uid 500); 20 Feb 2010 03:41:17 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 25301 invoked by uid 500); 20 Feb 2010 03:41:17 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 25292 invoked by uid 99); 20 Feb 2010 03:41:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Feb 2010 03:41:16 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of santal.li@gmail.com designates 209.85.221.199 as permitted sender) Received: from [209.85.221.199] (HELO mail-qy0-f199.google.com) (209.85.221.199) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Feb 2010 03:41:08 +0000 Received: by qyk37 with SMTP id 37so418852qyk.15 for ; Fri, 19 Feb 2010 19:40:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=6A8J522CTWNFt3V+VLz0Uf242zcf1H1Rck2zjt9OyeE=; b=nOYH74acq6Nl+DPcchblkdsd0qC3cS5p5Bvzmlkyvl3XDNcX68Rrw5qjBBvlFzds+i RL/CmIVQHQxR8taXJwRoOBucYl0ZBk0Asdku/cpJjQya5rDmwpJ30K2gt0lVp98WWLQ5 O2VrkxMn+HllVqZAkqySPwiJ2mAhQ/pRauWXI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=TF5CyYUjgGIkdfJIdgTq1MJkge79bjClIt4/BqwA8FKQ9xNAohXRhxOVjAM1LWUDHT 1/GahSUN+0bexIFpd2O9xcISkPvsrhQON21Emw2Zpxm4PRR+ul35+FjKQlfMsNmr/hVD mFWjUFRX4/zWBmWhDAZ5jSXz4G93H3OZhFNy4= MIME-Version: 1.0 Received: by 10.229.45.80 with SMTP id d16mr994114qcf.69.1266637247032; Fri, 19 Feb 2010 19:40:47 -0800 (PST) In-Reply-To: <5f7770581002161028s21c54cdfk846e2c973d06aaf6@mail.gmail.com> References: <5f7770581002161028s21c54cdfk846e2c973d06aaf6@mail.gmail.com> Date: Sat, 20 Feb 2010 11:40:46 +0800 Message-ID: Subject: Re: cassandra freezes From: Santal Li To: cassandra-user@incubator.apache.org Content-Type: multipart/alternative; boundary=0016364eea362b2ea1047ffff715 X-Virus-Checked: Checked by ClamAV on apache.org --0016364eea362b2ea1047ffff715 Content-Type: text/plain; charset=ISO-8859-1 I meet almost same thing as you. When I do some benchmarks write test, some times one Cassandra will freeze and other node will consider it was shutdown and up after 30+ second. I am using 5 node, each node 8G mem for java heap. >From my investigate, it was caused by GC thread, because I start the JConsole and monitor with the memory heap usage, each time when the GC happend, heap usage will drop down from 6G to 1G, and check the casandra log, I found the freeze happend at exactly same times. So I think when using huge memory(>2G), maybe need using some different GC stratege other than the default one provide by Cassandra lunch script. Dose't anyone meet this situation, can you please provide some guide? Thanks -Santal 2010/2/17 Tatu Saloranta > On Tue, Feb 16, 2010 at 6:25 AM, Boris Shulman wrote: > > Hello, I'm running some benchmarks on 2 cassandra nodes each running > > on 8 cores machine with 16G RAM, 10G for Java heap. I've noticed that > > during benchmarks with numerous writes cassandra just freeze for > > several minutes (in those benchmarks I'm writing batches of 10 columns > > with 1K data each for every key in a single CF). Usually after > > performing 50K writes I'm getting a TimeOutException and cassandra > > just freezes. What configuration changes can I make in order to > > prevent this? Is it possible that my setup just can't handle the load? > > How can I calculate the number of casandra nodes for a desired load? > > One thing that can cause seeming lockups is garbage collector. So > enabling GC debug output would be heplful, to see GC activity. Some > collector (CMS specifically) can stop the system for very long time, > up to minutes. This is not necessarily the root cause, but is easy to > rule out. > Beyond this, getting a stack trace during lockup would make sense. > That can pinpoint what threads are doing, or what they are blocked on > in case there is a deadlock or heavy contention on some shared > resource. > > -+ Tatu +- > --0016364eea362b2ea1047ffff715 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I meet almost same thing as you. When I do some benchmarks write test, some= times one Cassandra will freeze and other node will consider it was shutdo= wn and up after 30+ second. I am using 5 node, each node 8G mem for java he= ap.

From my investigate, it was caused by GC thread, because I start the JC= onsole and monitor with the memory heap usage, each time when the GC happen= d, heap usage will drop down from 6G to 1G, and check the casandra log, I f= ound the freeze happend at exactly same times.

So I think when using huge memory(>2G), maybe need using some differ= ent GC stratege other than the default one provide by Cassandra lunch scrip= t. Dose't anyone meet this situation, can you please provide some guide= ?


Thanks
-Santal

2010/2/17 Tatu = Saloranta <tsa= loranta@gmail.com>
On Tue, Feb 16, 2010 at 6:25 AM, Boris Sh= ulman <shulmanb@gmail.com> = wrote:
> Hello, I'm running some benchmarks on 2 cassandra nodes each runni= ng
> on 8 cores machine with 16G RAM, 10G for Java heap. I've noticed t= hat
> during benchmarks with numerous writes cassandra just freeze for
> several minutes (in those benchmarks I'm writing batches of 10 col= umns
> with 1K data each for every key in a single CF). Usually after
> performing 50K writes I'm getting a TimeOutException and cassandra=
> just freezes. What configuration changes can I make in order to
> prevent this? Is it possible that my setup just can't handle the l= oad?
> How can I calculate the number of casandra nodes for a desired load?
One thing that can cause seeming lockups is garbage collector. = So
enabling GC debug output would be heplful, to see GC activity. Some
collector (CMS specifically) can stop the system for very long time,
up to minutes. This is not necessarily the root cause, but is easy to
rule out.
Beyond this, getting a stack trace during lockup would make sense.
That can pinpoint what threads are doing, or what they are blocked on
in case there is a deadlock or heavy contention on some shared
resource.

-+ Tatu +-

--0016364eea362b2ea1047ffff715--