cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anuj Wadehra (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8907) Raise GCInspector alerts to WARN
Date Fri, 04 Sep 2015 12:44:46 GMT


Anuj Wadehra commented on CASSANDRA-8907:

Hi Joshua McKenzie..I have a different thought on this.. I think this property should be enabled
by default. Instead, if the property is missing, we should take a reasonably high gc warn
limit say 1000ms.

I think, no matter what kind of application it is, there is always a gc WARN limit for it
considering following factors:
1. Application Throughput Requirements/Service Level Agreements  OR
2. phi_convict_threshold : Larger GC Pauses may lead to nodes being marked down which is unacceptable
by any application. . I think that no application can afford unreasonably high GC pauses say
>60 secs.

Many Production systems rely on keywords for patrolling logs based on “ERROR” and “WARN”
and any unreasonably high GC pause should be at WARN level  else nodes will be marked down
without any warnings.

> Raise GCInspector alerts to WARN
> --------------------------------
>                 Key: CASSANDRA-8907
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Adam Hattrell
>              Labels: patch
>         Attachments: cassnadra-8907.patch
> I'm fairly regularly running into folks wondering why their applications are reporting
down nodes.  Yet, they report, when they grepped the logs they have no WARN or ERRORs listed.
> Nine times out of ten, when I look through the logs we see a ton of ParNew or CMS gc
pauses occurring similar to the following:
> INFO [ScheduledTasks:1] 2013-03-07 18:44:46,795 (line 122) GC for ConcurrentMarkSweep:
1835 ms for 3 collections, 2606015656 used; max is 10611589120
> INFO [ScheduledTasks:1] 2013-03-07 19:45:08,029 (line 122) GC for ParNew:
9866 ms for 8 collections, 2910124308 used; max is 6358564864
> To my mind these should be WARN's as they have the potential to be significantly impacting
the clusters performance as a whole.

This message was sent by Atlassian JIRA

View raw message