hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2907) dead datanodes because of OutOfMemoryError
Date Wed, 27 Feb 2008 04:58:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572790#action_12572790

Raghu Angadi commented on HADOOP-2907:

I looked at one of these dead datanodes. OutOfMemoryErros seems to be an independent problem.
 These errors (there are multiple of them) are in .out file with out time stamps. On this
node, .out was modified at 01:39 and log file shows DataNode seems to have functioned normally
for some more time. 

The datanode seems to stuck because one of its threads this stuck forever waiting for 'df'
to return while holding a central lock (FSDataset). And there is a zombied df process on the
machine. The offending stacktrace : 


"org.apache.hadoop.dfs.DataNode$DataXceiver@dc08c3" daemon prio=10 tid=0xae45e800 nid=0x2f3d
in Object.wait() [0x8cafe000..0x8caff030]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xed78c5f8> (a java.lang.UNIXProcess$Gate)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.UNIXProcess$Gate.waitForExit(UNIXProcess.java:64)
        - locked <0xed78c5f8> (a java.lang.UNIXProcess$Gate)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:145)
        at java.lang.ProcessImpl.start(ProcessImpl.java:65)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:115)
        at org.apache.hadoop.util.Shell.run(Shell.java:100)
        at org.apache.hadoop.fs.DF.getCapacity(DF.java:63)
        at org.apache.hadoop.dfs.FSDataset$FSVolume.getCapacity(FSDataset.java:307)
        at org.apache.hadoop.dfs.FSDataset$FSVolume.getAvailable(FSDataset.java:311)
        at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:393)
        - locked <0xb6551838> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
        at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:657)
        - locked <0xb6551838> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
        - locked <0xb653aec8> (a org.apache.hadoop.dfs.FSDataset)
        at org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1983)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
        at java.lang.Thread.run(Thread.java:619)

We also need to find why there are mutlipel OutOfMemoryError. My guess is that some of the
normally functioning datanodes will have these as well.

> dead datanodes because of OutOfMemoryError
> ------------------------------------------
>                 Key: HADOOP-2907
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2907
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
> We see more dead datanodes than in previous releases. The common exception is found in
the out file:
> Exception in thread "org.apache.hadoop.dfs.DataBlockScanner@18166e5" java.lang.OutOfMemoryError:
Java heap space
> Exception in thread "DataNode: [dfs.data.dir-value]" java.lang.OutOfMemoryError: Java
heap space

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message