hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhijeet Apsunde (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7283) Bump DataNode OOM log from WARN to ERROR
Date Wed, 06 May 2015 17:11:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530938#comment-14530938

Abhijeet Apsunde commented on HDFS-7283:

Is there a place where datanode's behavior in different cases of  OOMs is documented?
We are facing following problem on our cluster, looks like this issue addresses log level
changes. May I create a new issue for this ?

We have several data nodes failing because of OOM. Daemon still runs, transfers blocks, deletes
blocks, responds as an active process (i.e. not dead), but it no longer accepts incoming blocks.
So other HDFS nodes are timing out trying to transfer blocks to this one. There are cascading
problems like job failures due to timeouts.

We are on HDFS of Hadoop v2.2.0

> Bump DataNode OOM log from WARN to ERROR
> ----------------------------------------
>                 Key: HDFS-7283
>                 URL: https://issues.apache.org/jira/browse/HDFS-7283
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.0.0-alpha
>            Reporter: Stephen Chu
>            Assignee: Stephen Chu
>            Priority: Trivial
>              Labels: supportability
>             Fix For: 2.7.0
>         Attachments: HDFS-7283.1.patch
> When the DataNode OOMs, it logs the following WARN message which should be bumped up
to ERROR because DataNode OOM often leads to DN process abortion.
> {code}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode is out of memory. Will
retry in 30 seconds. 
> 4751 java.lang.OutOfMemoryError: unable to create new native thread"
> {code}
> Thanks to Roland Teague for identifying this.

This message was sent by Atlassian JIRA

View raw message