hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
Date Thu, 03 Jun 2010 17:39:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875211#action_12875211

Suresh Srinivas commented on HDFS-1114:

# Not using supplemental hash function will result in severe clustering when we move to sequential
block IDs (as only higher bits are used for hash).
# Why do we need configurability of either using java HashMap or this new implementation?

#* With new impl, BlockInfo implements LinkedElement interface. On switching to java HashMap
would it continue to implement this interface and incur the cost of {{next}} member in BlockInfo?
# In "Arrays" section the GC behavior description was not clear. Not sure how the GC behavior
is better with arrays?
# Static array size for the map simplifies the code, but pushes complexity to the cluster
admin by adding one more configuration. This configuration is an internal implementation detail
which a cluster admin may not understand and get it right. If it configured wrong and the
cluster continues to work, cluster admin may not be aware of performance degradation.
# I feel we should implement resizing to avoid introducing config param. It is a rare event
on a stable cluster.  NN has enough heap head room to account for floating garage and YG guarantee.
Hence availability of memory should not be an issue. Worst case scenario, resize may trigger
a full GC. 
# If we implement resizing we should also think about 2^N table size as it has potential to
waste a lot of memory during doubling, especially considering millions of entries in the table.

> Reducing NameNode memory usage by an alternate hash table
> ---------------------------------------------------------
>                 Key: HDFS-1114
>                 URL: https://issues.apache.org/jira/browse/HDFS-1114
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: GSet20100525.pdf
> NameNode uses a java.util.HashMap to store BlockInfo objects.  When there are many blocks
in HDFS, this map uses a lot of memory in the NameNode.  We may optimize the memory usage
by a light weight hash table implementation.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message