hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9618) Add thread which detects JVM pauses
Date Thu, 06 Jun 2013 16:41:21 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677214#comment-13677214

Todd Lipcon commented on HADOOP-9618:

Hey Hitesh. We already have the "EventCounter" log4j appender. Rather than making one-off
metrics for this, I think we should just extend the EventCounter to take a list of regular
expressions to map to metrics in the logs - eg you could say something the following in the
log4j configuration:

log4j.appender.EventCounter.WARN.gc-pauses=Detected pause in JVM

Does that seem like a more general way of achieving the above?

bq. FWIW, -XX:UseGCLogFileRotation is available in JDK 6u34 and 7u2+.
Thanks, I forgot about that new feature. Still it's nicer to have this info exposed via log4j,
and with a consistent format (the Java GC logs keep changing format and also look different
depending on which collector you're using, if I recall correctly)
> Add thread which detects JVM pauses
> -----------------------------------
>                 Key: HADOOP-9618
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9618
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 3.0.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-9618.txt
> Often times users struggle to understand what happened when a long JVM pause (GC or otherwise)
causes things to malfunction inside a Hadoop daemon. For example, a long GC pause while logging
an edit to the QJM may cause the edit to timeout, or a long GC pause may make other IPCs to
the NameNode timeout. We should add a simple thread which loops on 1-second sleeps, and if
the sleep ever takes significantly longer than 1 second, log a WARN. This will make GC pauses
obvious in logs.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message