hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1110) Namenode heap optimization - reuse objects for commonly used file names
Date Tue, 11 May 2010 05:11:31 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866060#action_12866060
] 

dhruba borthakur commented on HDFS-1110:
----------------------------------------

Awesome. I like this idea because it has no configuration shing-bang and is automatic. My
opinion is that it is ok to do this only during NN startup time. Maybe you can have only one
dictionary where you store the count-of-occurances as well. i.e move the count from the transient
map to the dictionary... otherwise the same logic as u described. when the image is fully
loaded, we have to purge the dictionary of all items whose count is lesser than 10 .

also, this has some relationship to HDFS-1140.


> Namenode heap optimization - reuse objects for commonly used file names
> -----------------------------------------------------------------------
>
>                 Key: HDFS-1110
>                 URL: https://issues.apache.org/jira/browse/HDFS-1110
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: 0.22.0
>
>         Attachments: hdfs-1110.2.patch, hdfs-1110.patch
>
>
> There are a lot of common file names used in HDFS, mainly created by mapreduce, such
as file names starting with "part". Reusing byte[] corresponding to these recurring file names
will save significant heap space used for storing the file names in millions of INodeFile
objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message