hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
Date Mon, 14 Jun 2010 03:22:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878462#action_12878462
] 

Scott Carey commented on HDFS-1114:
-----------------------------------

if you are using a power of two hash table, you can avoid problems caused by hash value clustering
by using a Fibonacci Hash.   Essentially, use the multiplicative hash with a special value
g:

(h * g) & mask

where h is the hash value and g is the 'golden ratio' number for the size of the table used.
 Since multiplication on today's processors is far faster than division or remainders, this
can be used to 'uncluster' hash values.  A single consecutive run of values gets maximally
distributed into the space, and high and low bits are redistributed evenly so that the mask
does not increase collisions.  Whether this is a desired property or not will depend on the
properties of the hash values and whether or not an open addressing solution is used.

Open addressing can further reduce the memory footprint by allowing the raw object to be placed
in the map instead of a container object or list.

some links found from a few searches:
http://www.brpreiss.com/books/opus4/html/page214.html
http://staff.newport.ac.uk/ctubb01/ct/advp/hashtables.pdf

> Reducing NameNode memory usage by an alternate hash table
> ---------------------------------------------------------
>
>                 Key: HDFS-1114
>                 URL: https://issues.apache.org/jira/browse/HDFS-1114
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch
>
>
> NameNode uses a java.util.HashMap to store BlockInfo objects.  When there are many blocks
in HDFS, this map uses a lot of memory in the NameNode.  We may optimize the memory usage
by a light weight hash table implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message