cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anuj Wadehra (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8907) Raise GCInspector alerts to WARN
Date Mon, 07 Sep 2015 14:11:46 GMT


Anuj Wadehra commented on CASSANDRA-8907:

I am adding a 3rd scenario to the 2 scenarios I mentioned in my earlier comment:

3. GC warn threshold is enabled by default and set to 5000ms.Suppose an application is NOT
sensitive to gc pauses e.g. some background job. Even though no functionality is impacted
and application SLA is being met, its getting 5+ secs of gc pauses in the background. When
the user upgrades Cassandra he will start getting Warnings for every gc pause over 5 sec.
I wont call that 'breaking' of existing log monitoring system with new warnings. Warnings
are warnings "an indication of possible problem" not errors. Any gc pause over 5 secs indicates
poor heap tuning / insufficient heap. After upgrade, User must start getting these warnings
so that he can look at options for optimizing JVM tunings. 

Based on the 3 scenarios I mentioned, scenarios 2 and 3 support enabling this property by
default and setting value to something like 5+ secs so that user is aware of possible problems
with GC tuning upfront. If user is warned and he still wants to continue with long gc pauses,
he can increase the gc warn threshold. But its should be Cassandra's responsibility to make
user aware of possible problems by raising warning especially when we have a GCInspector which
is monitoring Gc pauses.

> Raise GCInspector alerts to WARN
> --------------------------------
>                 Key: CASSANDRA-8907
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Adam Hattrell
>            Assignee: Amit Singh Chowdhery
>              Labels: patch
>         Attachments: cassnadra-8907.patch
> I'm fairly regularly running into folks wondering why their applications are reporting
down nodes.  Yet, they report, when they grepped the logs they have no WARN or ERRORs listed.
> Nine times out of ten, when I look through the logs we see a ton of ParNew or CMS gc
pauses occurring similar to the following:
> INFO [ScheduledTasks:1] 2013-03-07 18:44:46,795 (line 122) GC for ConcurrentMarkSweep:
1835 ms for 3 collections, 2606015656 used; max is 10611589120
> INFO [ScheduledTasks:1] 2013-03-07 19:45:08,029 (line 122) GC for ParNew:
9866 ms for 8 collections, 2910124308 used; max is 6358564864
> To my mind these should be WARN's as they have the potential to be significantly impacting
the clusters performance as a whole.

This message was sent by Atlassian JIRA

View raw message