Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 008339397 for ; Sun, 25 Sep 2011 10:59:51 +0000 (UTC) Received: (qmail 54885 invoked by uid 500); 25 Sep 2011 10:59:49 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 54862 invoked by uid 500); 25 Sep 2011 10:59:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 54853 invoked by uid 99); 25 Sep 2011 10:59:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Sep 2011 10:59:48 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of scode@scode.org designates 74.125.82.44 as permitted sender) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Sep 2011 10:59:41 +0000 Received: by wwf22 with SMTP id 22so4166855wwf.25 for ; Sun, 25 Sep 2011 03:59:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.229.93 with SMTP id g71mr7440725weq.14.1316948360097; Sun, 25 Sep 2011 03:59:20 -0700 (PDT) Sender: scode@scode.org Received: by 10.216.82.136 with HTTP; Sun, 25 Sep 2011 03:59:19 -0700 (PDT) X-Originating-IP: [212.238.87.26] In-Reply-To: References: Date: Sun, 25 Sep 2011 12:59:19 +0200 X-Google-Sender-Auth: CB-Q68TlGKFwJyvlvoPZp5lYAC8 Message-ID: Subject: Re: CMS GC initial-mark taking 6 seconds , bad? From: Peter Schuller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org > I see the following in my GC log > > 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)] > 13749939K(49807360K), 6.0696680 secs] [Times: user=3D6.10 sys=3D0.00, > real=3D6.07 secs] > > so there is a stop-the-world period of 6 seconds. does this sound bad > ? or 6 seconds is OK =C2=A0and we should expect the built-in > fault-tolerance of Cassandra handle this? initial-mark pauses are stop-the-world, so a 6 second initial-mark would have paused the node for those 6 seconds. The initial mark is essentially marking roots for old-gen; that should include thread stacks and such, but will also include younger generations. You might read [1] which talks a bit about it; a recommendation there is to make sure that initial marks happen right after a young-gen collection, and they advise increasing heap size sufficiently to allow an ininitial mark to trigger (I suppose by heuristics) after the young gen collection, prior to the CMS trigger. It makes sense, especially given that initial-mark is single-threaded, to try do to that (and leave the young-gen smaller, collected by the parallel collector). However I'm not entirely clear on what VM options are required for this. I had a brief look at the code but it wasn't obvious at cursory glance under what circumstances an initial mark is triggered right after young-gen vs. not. In your case you clearly have enough heap. Can you correlate with ParNew collections and see if the initial mark pauses seem to happen immediately after a ParNew, or somewhere in between, in the cases where they take this long? Also, as a mitigationg: What's your young generation size? One way to mitigate the problem, if it is indeed the young gen marking that is taking time, is to decrease the size of the young generation to leave less work for initial marking. Normally the young gen is sized based on expected pause times given parallel ParNew ollections, but if a non-parallel initial-mark is having to do marking of the same contents the pause time could be higher (hence the discussion above). Also, is each initial mark this long, or is that something that happens once in a while? As for Cassandra dealing with it: It is definitely not a good thing to have 6 second pauses. Even with all other nodes up, it takes time for the dynamic snitch to realize what's going on and you will tend to see a subset of requests to the cluster get 'stuck' in circumstances like that. Also, if you're e.g. doing QUORUM at RF=3D3, if a node is down for legitimate reasons, another node having a 6 second pause will by necessity cause high latency for requests during that period. [1] http://answerpot.com/showthread.php?1558705-CMS+initial+mark+pauses --=20 / Peter Schuller (@scode on twitter)