hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list
Date Wed, 25 Mar 2015 20:40:53 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380725#comment-14380725

Daryn Sharp commented on HDFS-6658:

bq. It would be nice if this could help with the goals of HDFS-7836 ... Right now I don't
see a path from this patch to there but very possibly I'm missing something.

I think if anything this patch will make your goals easier to achieve due to better abstractions.

Currently the block control logic is diffused throughout the nodes, storages, blockinfos,
BM, etc.  The BM is now the focal control object for all block manipulations.  The storages,
blockinfos, etc are now dumb model objects.

The BM isn't really aware of the special data structures which are hidden from it via the
BlocksMap's storage & block iterators.  In fact the rest of the BlocksMap relies on its
iterators to hide the implementation details.  It's not until you go into the BlocksMap's
BlockReplicaMap that things get interesting.

If I can clear up a few dependency issues with the storage/block iterators, moving them into
BlockReplicaMap should make the changes invisible.  At which time it should be much easier
to swap in an impl that meets your needs, assuming we can't evolve the new data structures
to be thread-safe.

> Namenode memory optimization - Block replicas list 
> ---------------------------------------------------
>                 Key: HDFS-6658
>                 URL: https://issues.apache.org/jira/browse/HDFS-6658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.1
>            Reporter: Amir Langer
>            Assignee: Daryn Sharp
>         Attachments: BlockListOptimizationComparison.xlsx, BlocksMap redesign.pdf, HDFS-6658.patch,
HDFS-6658.patch, HDFS-6658.patch, Namenode Memory Optimizations - Block replicas list.docx,
New primative indexes.jpg, Old triplets.jpg
> Part of the memory consumed by every BlockInfo object in the Namenode is a linked list
of block references for every DatanodeStorageInfo (called "triplets"). 
> We propose to change the way we store the list in memory. 
> Using primitive integer indexes instead of object references will reduce the memory needed
for every block replica (when compressed oops is disabled) and in our new design the list
overhead will be per DatanodeStorageInfo and not per block replica.
> see attached design doc. for details and evaluation results.

This message was sent by Atlassian JIRA

View raw message