hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4630) Datanode is going OOM due to small files in hdfs
Date Mon, 25 Mar 2013 11:51:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612564#comment-13612564
] 

Steve Loughran commented on HDFS-4630:
--------------------------------------

I'd say "WONTFIX" over invalid; the OOM is a result of storing all state in memory for bounded
time operations against files, including block retrieval. That's a design decision. Now, if
you want to put EhCache in behind the scenes, assess its performance with many small files,
and its behaviour on big production clusters, that's a project I'm sure we'd all be curious
about -feel free to have a go!
                
> Datanode is going OOM due to small files in hdfs
> ------------------------------------------------
>
>                 Key: HDFS-4630
>                 URL: https://issues.apache.org/jira/browse/HDFS-4630
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 2.0.0-alpha
>         Environment: Ubuntu, Java 1.6
>            Reporter: Ankush Bhatiya
>            Priority: Blocker
>
> Hi, 
> We have very small files(size ranging 10KB-1MB) in our hdfs and no of files are in tens
of millions. Due to this namenode and datanode both going out of memory very frequently. When
we analyse the head dump of datanode most of the memory was used by ReplicaMap. 
> Can we use EhCache or other to not to store all the data in memory? 
> Thanks
> Ankush

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message