hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list
Date Mon, 14 Jul 2014 22:50:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061381#comment-14061381
] 

Colin Patrick McCabe commented on HDFS-6658:
--------------------------------------------

I think that we need to take a more long-term view here.  Clearly, the number of replicas
in the cluster is going to double a few times in the next few years.  The big problem is that
after a certain size, JVM heaps just become unmanageable.  Full GC times grow too long.  We
lose the ability to use optimizations like compressed oopses (which is a speed as well as
memory win.)  Compressed oopses are not available after the JVM heap grows beyond 32 GB.

Optimizations that just give a constant factor reduction in NameNode's JVM heap size, like
this one (HDFS-6658), aren't a very long-term solution.  Another doubling in the number of
block replicas would more than wipe out the gain here.  Splitting the block manager out into
a separate daemon (HDFS-5477) isn't a long term solution either.  Sure, it cuts the NameNode
heap in 2 (or whatever fraction of the NN heap is taken up by the BlockManager).  But another
doubling wipes that out too.

Putting the {{BlockManager}} memory off-heap is a more long-term solution to the problem.
 With the block replicas and the inodes off-heap, we could have a small NameNode heap, and
long GCs would never be a problem again.  Of course, eventually there may be other bottlenecks
in the system, like the size of full block reports.  Nobody said off-heap was a silver bullet.
 But it seems like a useful starting point for many other optimizations.

This optimization doesn't seem like a useful starting point to anything.  If I understand
correctly, its best-case claimed benefit (20%) only applies to the case where users have compressed
oopses disabled.  If compressed oopses are on, Java references are already only 4 bytes, and
performance will actually be worse, right?  I think we should consider alternatives before
we go down this path.

> Namenode memory optimization - Block replicas list 
> ---------------------------------------------------
>
>                 Key: HDFS-6658
>                 URL: https://issues.apache.org/jira/browse/HDFS-6658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.1
>            Reporter: Amir Langer
>            Assignee: Amir Langer
>         Attachments: Namenode Memory Optimizations - Block replicas list.docx
>
>
> Part of the memory consumed by every BlockInfo object in the Namenode is a linked list
of block references for every DatanodeStorageInfo (called "triplets"). 
> We propose to change the way we store the list in memory. 
> Using primitive integer indexes instead of object references will reduce the memory needed
for every block replica (when compressed oops is disabled) and in our new design the list
overhead will be per DatanodeStorageInfo and not per block replica.
> see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message