Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Mon, 25 Mar 2013 11:51:16 +0000 (UTC)
From: "Steve Loughran (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12638717.1364125218595.47909.1364212276144@arcas>
In-Reply-To: <JIRA.12638717.1364125218595@arcas>
References: <JIRA.12638717.1364125218595@arcas>
Subject: [jira] [Commented] (HDFS-4630) Datanode is going OOM due to small
 files in hdfs
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612564#comment-13612564 ] 

Steve Loughran commented on HDFS-4630:
--------------------------------------

I'd say "WONTFIX" over invalid; the OOM is a result of storing all state in memory for bounded time operations against files, including block retrieval. That's a design decision. Now, if you want to put EhCache in behind the scenes, assess its performance with many small files, and its behaviour on big production clusters, that's a project I'm sure we'd all be curious about -feel free to have a go!
                
> Datanode is going OOM due to small files in hdfs
> ------------------------------------------------
>
>                 Key: HDFS-4630
>                 URL: https://issues.apache.org/jira/browse/HDFS-4630
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 2.0.0-alpha
>         Environment: Ubuntu, Java 1.6
>            Reporter: Ankush Bhatiya
>            Priority: Blocker
>
> Hi, 
> We have very small files(size ranging 10KB-1MB) in our hdfs and no of files are in tens of millions. Due to this namenode and datanode both going out of memory very frequently. When we analyse the head dump of datanode most of the memory was used by ReplicaMap. 
> Can we use EhCache or other to not to store all the data in memory? 
> Thanks
> Ankush

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira