Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 31918 invoked from network); 29 Feb 2008 01:06:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Feb 2008 01:06:48 -0000 Received: (qmail 35324 invoked by uid 500); 29 Feb 2008 01:06:43 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 35295 invoked by uid 500); 29 Feb 2008 01:06:43 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 35285 invoked by uid 99); 29 Feb 2008 01:06:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Feb 2008 17:06:43 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Feb 2008 01:06:17 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3A898234C04A for ; Thu, 28 Feb 2008 17:05:51 -0800 (PST) Message-ID: <1563907347.1204247151238.JavaMail.jira@brutus> Date: Thu, 28 Feb 2008 17:05:51 -0800 (PST) From: "Raghu Angadi (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-2907) dead datanodes because of OutOfMemoryError In-Reply-To: <1667842456.1204075251034.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573541#action_12573541 ] Raghu Angadi commented on HADOOP-2907: -------------------------------------- Slow reads are same problem as before. It hasn't changed in terms of buffering. But slow writes have bigger penalty in terms of buffer space used. Before 16, it didn't matter how slow client wrote, DFSClient cached full block on its local disk and streamed the whole block quickly to DataNode. In 16, client has connections open pretty much as long as the output stream is open. For each client connection, 2 datanodes have 4 buffers and the last datanode has 3 buffers. Each of these buffers is of the size io.file.buffer.size. This might explain why you are seeing more OutOfMemory errors now. We still need to find out about number of connections open. We could expect some of the datanodes to have multiple times the average load. Most of the time, these errors are caught by DataNode data transfer threads, we could print all the stack traces or some equally useful info once every few minutes of these exceptions to log. The stacktrace shows how many reads and writes are going on. Let me know if I should prepare patch. > dead datanodes because of OutOfMemoryError > ------------------------------------------ > > Key: HADOOP-2907 > URL: https://issues.apache.org/jira/browse/HADOOP-2907 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.16.0 > Reporter: Christian Kunz > > We see more dead datanodes than in previous releases. The common exception is found in the out file: > Exception in thread "org.apache.hadoop.dfs.DataBlockScanner@18166e5" java.lang.OutOfMemoryError: Java heap space > Exception in thread "DataNode: [dfs.data.dir-value]" java.lang.OutOfMemoryError: Java heap space -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.