Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0007F17401 for ; Tue, 31 Mar 2015 13:24:53 +0000 (UTC) Received: (qmail 18131 invoked by uid 500); 31 Mar 2015 13:24:53 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 18087 invoked by uid 500); 31 Mar 2015 13:24:53 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 18075 invoked by uid 99); 31 Mar 2015 13:24:53 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2015 13:24:53 +0000 Date: Tue, 31 Mar 2015 13:24:53 +0000 (UTC) From: "Adam Hattrell (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8907) Raise GCInspector alerts to WARN MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388510#comment-14388510 ] Adam Hattrell commented on CASSANDRA-8907: ------------------------------------------ Sorry only just come back to this. I was only talking about those 200ms pauses. Anything that large will be causing serious disruption to any enterprise level user. To my mind these should be WARN level regardless of your environment. They are "bad" by default. I see some users that aim for a 5 ms SLA. For them a 200ms pause is probably way to high - they would actually like to know about much smaller so a tunable level would actually be awesome. With regards to users reading their logs - most admins set alerts to fire when they get WARN and ERROR. Having to write manual greps to try and pull out GCInspectors (or dropped mutations which is another bugbear) is a pita - and really shouldn't be necessary. > Raise GCInspector alerts to WARN > -------------------------------- > > Key: CASSANDRA-8907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8907 > Project: Cassandra > Issue Type: Improvement > Reporter: Adam Hattrell > > I'm fairly regularly running into folks wondering why their applications are reporting down nodes. Yet, they report, when they grepped the logs they have no WARN or ERRORs listed. > Nine times out of ten, when I look through the logs we see a ton of ParNew or CMS gc pauses occurring similar to the following: > INFO [ScheduledTasks:1] 2013-03-07 18:44:46,795 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 1835 ms for 3 collections, 2606015656 used; max is 10611589120 > INFO [ScheduledTasks:1] 2013-03-07 19:45:08,029 GCInspector.java (line 122) GC for ParNew: 9866 ms for 8 collections, 2910124308 used; max is 6358564864 > To my mind these should be WARN's as they have the potential to be significantly impacting the clusters performance as a whole. -- This message was sent by Atlassian JIRA (v6.3.4#6332)