Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <1563907347.1204247151238.JavaMail.jira@brutus>
Date: Thu, 28 Feb 2008 17:05:51 -0800 (PST)
From: "Raghu Angadi (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Commented: (HADOOP-2907) dead datanodes because of
 OutOfMemoryError
In-Reply-To: <1667842456.1204075251034.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573541#action_12573541 ] 

Raghu Angadi commented on HADOOP-2907:
--------------------------------------


Slow reads are same problem as before. It hasn't changed in terms of buffering.

But slow writes have bigger penalty in terms of buffer space used. Before 16, it didn't matter how slow client wrote, DFSClient cached full block on its local disk and streamed the whole block quickly to DataNode. In 16, client has connections open pretty much as long as the output stream is open. For each client connection, 2 datanodes have 4 buffers and the last datanode has 3 buffers. Each of these buffers is of the size io.file.buffer.size. This might explain why you are seeing more OutOfMemory errors now.

We still need to find out about number of connections open. We could expect some of the datanodes to have multiple times the average load. 

Most of the time, these errors are caught by DataNode data transfer threads, we could print all the stack traces or some equally useful info once every few minutes of these exceptions to log. The stacktrace shows how many reads and writes are going on. Let me know if I should prepare patch.

> dead datanodes because of OutOfMemoryError
> ------------------------------------------
>
>                 Key: HADOOP-2907
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2907
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
>
> We see more dead datanodes than in previous releases. The common exception is found in the out file:
> Exception in thread "org.apache.hadoop.dfs.DataBlockScanner@18166e5" java.lang.OutOfMemoryError: Java heap space
> Exception in thread "DataNode: [dfs.data.dir-value]" java.lang.OutOfMemoryError: Java heap space

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.