From user-return-34614-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Sat Jun 15 19:45:25 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F018D10424 for ; Sat, 15 Jun 2013 19:45:25 +0000 (UTC) Received: (qmail 7801 invoked by uid 500); 15 Jun 2013 19:45:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 7727 invoked by uid 500); 15 Jun 2013 19:45:22 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 7719 invoked by uid 99); 15 Jun 2013 19:45:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Jun 2013 19:45:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mohitanchlia@gmail.com designates 209.85.220.52 as permitted sender) Received: from [209.85.220.52] (HELO mail-pa0-f52.google.com) (209.85.220.52) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Jun 2013 19:45:18 +0000 Received: by mail-pa0-f52.google.com with SMTP id kq13so1636111pab.39 for ; Sat, 15 Jun 2013 12:44:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:mime-version:in-reply-to:content-type :content-transfer-encoding:message-id:cc:x-mailer:from:subject:date :to; bh=EBEQlSEgBlW0c9kiuNHQJK2A4oc57uZIH+zQYB1Q2lg=; b=Fa+oaUvfW/9UO4EaB0SKn79TFGoAmhJWgt14c9uiFR9IuZhsyZcD0DIbRZxzqXsllh yytY8zUE8zk9D3cqIYLmn5wIGsXxphErdZ9w5f0W5SWDO5wuf2Q2Or9kTFrHFNv+J3Mg GDmIt0aBwmLQoIiVaKp+WZUJ66jNcj9HYDHzUTzjZp1+i0mfozPBdMgdmqbvhcXh4gT4 ziWxggRwuZQ12PIS+EGBYLWi7dfPhyTOryUJLJXsBLjhHY5cgYgUjQ6/Hdbk0NSUz6aZ oldl5KMF1qVRMTcGf7jKvIT2alapdE3emfLa6ama+Kl0EV+BK2rRwGHFX+PuRbEnTII1 ltdw== X-Received: by 10.66.26.231 with SMTP id o7mr7225663pag.207.1371325497916; Sat, 15 Jun 2013 12:44:57 -0700 (PDT) Received: from [10.28.230.26] (mobile-166-137-187-171.mycingular.net. [166.137.187.171]) by mx.google.com with ESMTPSA id 6sm7237975pbn.45.2013.06.15.12.44.55 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 15 Jun 2013 12:44:57 -0700 (PDT) References: <2C85E14562B39345BCCAD90B8E7955C929BE4B@DKEXC002.adform.com> <2C85E14562B39345BCCAD90B8E7955C929C241@DKEXC002.adform.com> <51B237BA.9060809@4friends.od.ua> Mime-Version: 1.0 (1.0) In-Reply-To: Content-Type: multipart/alternative; boundary=Apple-Mail-C045C175-9F91-4B4D-B205-710708D978D8 Content-Transfer-Encoding: 7bit Message-Id: <6D3C4DCA-8CD6-4D3E-915D-66CFF8323F13@gmail.com> Cc: "user@cassandra.apache.org" X-Mailer: iPhone Mail (10A525) From: Mohit Anchlia Subject: Re: Reduce Cassandra GC Date: Sat, 15 Jun 2013 12:44:51 -0700 To: "user@cassandra.apache.org" X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-C045C175-9F91-4B4D-B205-710708D978D8 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Can you paste you gc config? Also can you take a heap dump at 2 diff points s= o that we can compare it? Quick thing to do would be to do a histo live at 2 points and compare Sent from my iPhone On Jun 15, 2013, at 6:57 AM, Takenori Sato wrote: > > INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 1= 22) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 10469= 37600 >=20 > This says GC for New Generation took so long. And this is usually unlikely= .=20 >=20 > The only situation I am aware of is when a fairly large object is created,= and which can not be promoted to Old Generation because it requires such a l= arge *contiguous* memory space that is unavailable at the point in time. Thi= s is called promotion failure. So it has to wait until concurrent collector c= ollects a large enough space. Thus you experience stop the world. But I thin= k it is not stop the world, but only stop the new world. >=20 > For example in case of Cassandra, a large number of in_memory_compaction_l= imit_in_mb can cause this. This is a limit when a compaction compacts(merges= ) rows of a key into the latest in memory. So this creates a large byte arra= y up to the number. >=20 > You can confirm this by enabling promotion failure GC logging in the futur= e, and by checking compactions executed at that point in time. >=20 >=20 >=20 > On Sat, Jun 15, 2013 at 10:01 AM, Robert Coli wrote= : >> On Fri, Jun 7, 2013 at 12:42 PM, Igor wrote: >> > If you are talking about 1.2.x then I also have memory problems on the i= dle >> > cluster: java memory constantly slow grows up to limit, then spend long= time >> > for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle >> > cluster java memory stay on the same value. >>=20 >> If you are not aware of a pre-existing JIRA, I strongly encourage you to := >>=20 >> 1) Document your experience of this. >> 2) Search issues.apache.org for anything that sounds similar. >> 3) If you are unable to find a JIRA, file one. >>=20 >> Thanks! >>=20 >> =3DRob >=20 --Apple-Mail-C045C175-9F91-4B4D-B205-710708D978D8 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit
Can you paste you gc config? Also can you take a heap dump at 2 diff points so that we can compare it?

Quick thing to do would be to do a histo live at 2 points and compare

Sent from my iPhone

On Jun 15, 2013, at 6:57 AM, Takenori Sato <tsato@cloudian.com> wrote:

INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600

This says GC for New Generation took so long. And this is usually unlikely. 

The only situation I am aware of is when a fairly large object is created, and which can not be promoted to Old Generation because it requires such a large *contiguous* memory space that is unavailable at the point in time. This is called promotion failure. So it has to wait until concurrent collector collects a large enough space. Thus you experience stop the world. But I think it is not stop the world, but only stop the new world.

For example in case of Cassandra, a large number of in_memory_compaction_limit_in_mb can cause this. This is a limit when a compaction compacts(merges) rows of a key into the latest in memory. So this creates a large byte array up to the number.

You can confirm this by enabling promotion failure GC logging in the future, and by checking compactions executed at that point in time.



On Sat, Jun 15, 2013 at 10:01 AM, Robert Coli <rcoli@eventbrite.com> wrote:
On Fri, Jun 7, 2013 at 12:42 PM, Igor <igor@4friends.od.ua> wrote:
> If you are talking about 1.2.x then I also have memory problems on the idle
> cluster: java memory constantly slow grows up to limit, then spend long time
> for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle
> cluster java memory stay on the same value.

If you are not aware of a pre-existing JIRA, I strongly encourage you to :

1) Document your experience of this.
2) Search issues.apache.org for anything that sounds similar.
3) If you are unable to find a JIRA, file one.

Thanks!

=Rob

--Apple-Mail-C045C175-9F91-4B4D-B205-710708D978D8--