hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-1110) Namenode heap optimization - reuse objects for commonly used file names
Date Thu, 13 May 2010 06:26:53 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Suresh Srinivas updated HDFS-1110:

    Attachment: hdfs-1110.3.patch

Attached patch implements the solution I had proposed previously. There is an additional optimization
due to the chosen mechanism. During startup, if a name is used more than once, byte[] is reused
(not just for the names used more than 10 times).

Only the names used more than 10 times is added to the dictionary. Adding other names will
undo the space gained due to heap used for storing hashmap entries (as described earlier).

I ran tests to benchmark startup time and total heap size (gotten by triggering full GC after
startup). Here are the results:
|| ||Without patch||With patch||
||Startup Time|880s|892s|
||Heap size|24.197G|22.372G|

Startup time increased by 12s with 1.825G saved from 24.197G heap.

BTW I am thinking of removing NamespaceDeduper tool attached in the patch. Any thoughts?

> Namenode heap optimization - reuse objects for commonly used file names
> -----------------------------------------------------------------------
>                 Key: HDFS-1110
>                 URL: https://issues.apache.org/jira/browse/HDFS-1110
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: 0.22.0
>         Attachments: hdfs-1110.2.patch, hdfs-1110.3.patch, hdfs-1110.patch
> There are a lot of common file names used in HDFS, mainly created by mapreduce, such
as file names starting with "part". Reusing byte[] corresponding to these recurring file names
will save significant heap space used for storing the file names in millions of INodeFile

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message