hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9618) Add thread which detects JVM pauses
Date Tue, 04 Jun 2013 18:33:21 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon updated HADOOP-9618:

    Attachment: hadoop-9618.txt

Attached patch implements this feature. I wasn't able to write an automated unit test for
this, since it relies on forcing the JVM to pause, which isn't really doable in a junit context.
I added a main function, though, which leaks memory and forces into GC. Here's the output
from my machine:

todd@todd-w510:~/git/hadoop-common/hadoop-common-project/hadoop-common$ java -verbose:gc -Xmx1g
-XX:+UseConcMarkSweepGC -cp /home/todd/confs/devconf.2nn-2/:target/classes/:/home/todd/git/hadoop-common/hadoop-dist/target/hadoop-2.0.0-cdh4.1.2/share/hadoop/common/lib/\*
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter
in all the log4j.properties files.
[GC 16715K->9327K(83008K), 0.0964960 secs]
[GC 26351K->23551K(83008K), 0.0406780 secs]
[GC 40575K->38828K(83008K), 0.0375710 secs]
[GC 39126K(83008K), 0.0027320 secs]
[GC 55852K->55851K(83008K), 0.0422420 secs]
[GC 70138K->69911K(88000K), 0.0437410 secs]
[GC 86935K->86934K(105028K), 0.0479130 secs]
[GC 103958K->103957K(121924K), 0.0572590 secs]
[GC 120981K->120804K(138824K), 0.0521890 secs]
[GC 137828K->137826K(155720K), 0.0508530 secs]
[GC 154850K->154848K(172808K), 0.0519540 secs]
[GC 163351K->163177K(181064K), 0.0307160 secs]
[GC 180201K->180201K(198160K), 0.0554300 secs]
[GC 197225K->197223K(215248K), 0.0660260 secs]
[GC 214247K->214244K(232144K), 0.0611980 secs]
[GC 231268K->231151K(249040K), 0.0543830 secs]
[GC 247289K->247117K(265168K), 0.0616800 secs]
[Full GC 247117K->225823K(265168K), 1.0242170 secs]
13/06/04 11:32:14 INFO util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC):
pause of approximately 1145ms
GC pool 'ParNew' had collection(s): count=2 time=117ms
GC pool 'ConcurrentMarkSweep' had collection(s): count=2 time=445ms
[GC 268895K->268138K(424824K), 0.1054190 secs]
[GC 268825K(424824K), 0.0061170 secs]
[GC 311210K->311208K(424824K), 0.1476640 secs]
[GC 352421K->351750K(424824K), 0.1371280 secs]
[GC 394822K->394691K(454648K), 0.1522430 secs]
[GC 437763K->437760K(481144K), 0.1447820 secs]
[GC 480832K->486355K(529720K), 0.1565950 secs]
[GC 529427K->529894K(573304K), 0.1264970 secs]
[GC 547400K->546694K(590008K), 0.0609490 secs]
[Full GC 546694K->507462K(590008K), 2.6322880 secs]
13/06/04 11:32:19 INFO util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC):
pause of approximately 2735ms
GC pool 'ParNew' had collection(s): count=2 time=188ms
GC pool 'ConcurrentMarkSweep' had collection(s): count=2 time=298ms
> Add thread which detects JVM pauses
> -----------------------------------
>                 Key: HADOOP-9618
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9618
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: util
>    Affects Versions: 3.0.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-9618.txt
> Often times users struggle to understand what happened when a long JVM pause (GC or otherwise)
causes things to malfunction inside a Hadoop daemon. For example, a long GC pause while logging
an edit to the QJM may cause the edit to timeout, or a long GC pause may make other IPCs to
the NameNode timeout. We should add a simple thread which loops on 1-second sleeps, and if
the sleep ever takes significantly longer than 1 second, log a WARN. This will make GC pauses
obvious in logs.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message