Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BDC74FF2F for ; Mon, 25 Mar 2013 11:51:17 +0000 (UTC) Received: (qmail 19788 invoked by uid 500); 25 Mar 2013 11:51:17 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 19623 invoked by uid 500); 25 Mar 2013 11:51:16 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 19602 invoked by uid 99); 25 Mar 2013 11:51:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Mar 2013 11:51:16 +0000 Date: Mon, 25 Mar 2013 11:51:16 +0000 (UTC) From: "Steve Loughran (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4630) Datanode is going OOM due to small files in hdfs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612564#comment-13612564 ] Steve Loughran commented on HDFS-4630: -------------------------------------- I'd say "WONTFIX" over invalid; the OOM is a result of storing all state in memory for bounded time operations against files, including block retrieval. That's a design decision. Now, if you want to put EhCache in behind the scenes, assess its performance with many small files, and its behaviour on big production clusters, that's a project I'm sure we'd all be curious about -feel free to have a go! > Datanode is going OOM due to small files in hdfs > ------------------------------------------------ > > Key: HDFS-4630 > URL: https://issues.apache.org/jira/browse/HDFS-4630 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode > Affects Versions: 2.0.0-alpha > Environment: Ubuntu, Java 1.6 > Reporter: Ankush Bhatiya > Priority: Blocker > > Hi, > We have very small files(size ranging 10KB-1MB) in our hdfs and no of files are in tens of millions. Due to this namenode and datanode both going out of memory very frequently. When we analyse the head dump of datanode most of the memory was used by ReplicaMap. > Can we use EhCache or other to not to store all the data in memory? > Thanks > Ankush -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira