hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo
Date Fri, 01 Feb 2013 23:18:13 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aaron T. Myers updated HDFS-4465:

    Attachment: dn-memory-improvements.patch

Hey Suresh, thanks a lot for filing this issue. A little while back I threw together a few
changes to see how much memory overhead improvement we could get in the DN with minimal effort.
Here's a little patch (not necessarily ready for commit) which shows the changes I made. This
patch does three things:

# Reduce the number of repeated String/char[] objects by storing a single reference to a base
path and then per replica it stores an int[] containing integers denoting the subdirs from
base dir to replica file, e.g. "1, 34, 2".
# Switch to using the LighWeightGSet instead of standard java.util structures where possible
in the DN. We already did this in the NN, but with a little adaptation we can do it for some
of the DN's data structures as well.
# Intern File objects where possible. Even though interning repeated Strings/char[] underlying
file objects is a step in the right direction, we can do a little bit better by doing our
own interning of File objects to further reduce overhead from repeated objects.

Using this patch I was able to see per-replica heap usage go from ~650 bytes per replica in
my test setup to ~250 bytes per replica.

Feel free to take this patch and run with it, use it for ideas, or ignore it entirely.
> Optimize datanode ReplicasMap and ReplicaInfo
> ---------------------------------------------
>                 Key: HDFS-4465
>                 URL: https://issues.apache.org/jira/browse/HDFS-4465
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>         Attachments: dn-memory-improvements.patch
> In Hadoop a lot of optimization has been done in namenode data structures to be memory
efficient. Similar optimizations are necessary for Datanode process. With the growth in storage
per datanode and number of blocks hosted on datanode, this jira intends to optimize long lived
ReplicasMap and ReplicaInfo objects.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message