hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1565) DFSScalability: reduce memory usage of namenode
Date Fri, 06 Jul 2007 17:49:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12510751
] 

Raghu Angadi commented on HADOOP-1565:
--------------------------------------

Last time I checked couple of months back, file name String somehow ended up using 128 byte
array. Could you double check? Milind noticed that this might be because of using substring()
to get file name from full path. If this is the case then, this can save around 100 bytes
per file.


> DFSScalability: reduce memory usage of namenode
> -----------------------------------------------
>
>                 Key: HADOOP-1565
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1565
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> Experiments have demonstrated that a single file/block needs about 300 to 500 bytes of
main memory on a 64-bit Namenode. This puts some limitations on the size of the file system
that a single namenode can support. Most of this overhead occurs because a block and/or filename
is inserted into multiple TreeMaps and/or HashSets.
> Here are a few ideas that can be measured to see if an appreciable reduction of memory
usage occurs:
> 1. Change FSDirectory.children from a TreeMap to an array. Do binary search in this array
while looking up children. This saves a TreeMap object for every intermediate node in the
directory tree.
> 2. Change INode from an inner class. This saves on one "parent object" reference for
each INODE instance. 4 bytes per inode.
> 3. Keep all DatanodeDescriptors in an array. BlocksMap.nodes[] is currently a 64-bit
reference to the DatanodeDescriptor object. Instead, it can be a 'short'. This will probably
save about 16 bytes per block.
> 4. Change DatanodeDescriptor.blocks from a SortedTreeMap to a HashMap? Block report processing
CPU cost can increase.
> For the records: TreeMap has the following fields:
> 	Object key;
> 	Object value;
> 	Entry left = null;
> 	Entry right = null;
> 	Entry parent;
> 	boolean color = BLACK;
> and HashMap object:
> 	final Object key;
> 	Object value;
> 	final int hash;
> 	Entry next;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message