hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakob Homan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1110) Namenode heap optimization - reuse objects for commonly used file names
Date Thu, 03 Jun 2010 18:50:56 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875240#action_12875240

Jakob Homan commented on HDFS-1110:

Looking good.

* If you keep {{NamespaceDedupe}}, which I would recommend as I do think it adds value in
and of itself, it's probably best to move its user-facing bits with the rest of the offline
image viewers. {{OfflineImageViewer.java}} handles all the command line arguments and such.
* {{NamespaceDedupe.java}}:51 line goes more than 80 characters.
* Nit: {{TestNameDictionary::testNameReuse()}} at first looked to me like a unit test that
hadn't annotated.  Maybe verifyNameReuse?
* The static class {{ByteArray}} seems like a candidate either for being a stand-alone class
or wrapped by {{NameDictionary}}; it's not really an integral part of {{FSDirectory}}.
* The {{NameDictionary.lookup(name, value)}} method seems a bit odd in its usage. Both times
it's used via dictionary.lookup(name, name), which makes me wonder if this is the right API.
 Do we expect {{NameDictionary}} to be used elsewhere such that this abstraction is worth
the odd API?  

Overall I think this is a good thing to do. The 12 second startup cost compared to the almost
2 gb savings seems worth it to me.  There should be a linear tradeoff such that small clusters
should see essentially no impact and large clusters pay a very small penalty at startup but
have the benefits for their entire runtime. 

A useful improvement later on may be a safemode command to repopulate the dictionary, which
would take into account changes since cluster startup, particularly newly popular filenames.

> Namenode heap optimization - reuse objects for commonly used file names
> -----------------------------------------------------------------------
>                 Key: HDFS-1110
>                 URL: https://issues.apache.org/jira/browse/HDFS-1110
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: 0.22.0
>         Attachments: hdfs-1110.2.patch, hdfs-1110.3.patch, hdfs-1110.patch
> There are a lot of common file names used in HDFS, mainly created by mapreduce, such
as file names starting with "part". Reusing byte[] corresponding to these recurring file names
will save significant heap space used for storing the file names in millions of INodeFile

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message