Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B88F99C03 for ; Tue, 20 Dec 2011 05:36:12 +0000 (UTC) Received: (qmail 70070 invoked by uid 500); 20 Dec 2011 05:36:10 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 70047 invoked by uid 500); 20 Dec 2011 05:36:09 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 70039 invoked by uid 99); 20 Dec 2011 05:36:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Dec 2011 05:36:08 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of scode@scode.org designates 209.85.215.172 as permitted sender) Received: from [209.85.215.172] (HELO mail-ey0-f172.google.com) (209.85.215.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Dec 2011 05:35:59 +0000 Received: by eaad1 with SMTP id d1so6159454eaa.31 for ; Mon, 19 Dec 2011 21:35:39 -0800 (PST) MIME-Version: 1.0 Received: by 10.205.81.141 with SMTP id zy13mr132428bkb.50.1324359339021; Mon, 19 Dec 2011 21:35:39 -0800 (PST) Sender: scode@scode.org Received: by 10.204.187.143 with HTTP; Mon, 19 Dec 2011 21:35:38 -0800 (PST) X-Originating-IP: [71.202.44.53] In-Reply-To: References: Date: Mon, 19 Dec 2011 21:35:38 -0800 X-Google-Sender-Auth: wZnwIoLYadxv7QaoskT8svyBc5M Message-ID: Subject: Re: Garbage collection freezes cassandra node From: Peter Schuller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org > During the garbage collections, Cassandra freezes for about ten seconds.= I observe the following log entries: > > > > =E2=80=9CGC for ConcurrentMarkSweep: 11597 ms for 1 collections, 18879331= 44 used; max is 8550678528=E2=80=9D Ok, first off: Are you certain that it is actually pausing, or are you assuming that due to the log entry above? Because the log entry in no way indicates a 10 second pause; it only indicates that CMS took 10 seconds - which is entirely expected, and most of CMS is concurrent and implies only short pauses. A full pause can happen, but that log entry is expected and is not in and of itself indicative of a stop-the-world 10 second pause. It is fully expected using the CMS collector that you'll have a sawtooth pattern as young gen is being collected, and then a sudden drop as CMS does its job concurrently without pausing the application for a long period of time. I will second the recommendation to run with -XX:+DisableExplicitGC (or -XX:+ExplicitGCInvokesConcurrent) to eliminate that as a source. I would also run with -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps and report back the results (i.e., the GC log around the time of the pause). Your graph is looking very unusual for CMS. It's possible that everything is as it otherwise should and CMS is kicking in too late, but I am kind of skeptical towards that even the extremely smooth look of your graph. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)