hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list
Date Wed, 16 Jul 2014 22:25:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064241#comment-14064241

Colin Patrick McCabe commented on HDFS-6658:

bq. I'm not totally convinced of the long GC argument. It's true that a worst case full-gc
will be much longer. However, isn't it also the case that we should almost never be doing
worst case full-GCs?  On a large and busy NN, we see a GC greater than 2 seconds maybe once
every couple of days. Usually the big outliers are the result of a very large application
doing something bad - in which case even if you solve the GC problem, something else is liable
to cause the NN to be unresponsive.

For applications like HBase, long GC pauses are a huge problem.  Even 2 seconds is an unpleasant
request latency.  NameNode failover can help, but setting NN failover times too low makes
the system unstable.

bq. This is a pretty sizable improvement though so it seems well worth considering.

I'm +1 for this improvement as long as it doesn't substantially regress memory consumption
on NameNodes that have compressed oopses turned on (many users run this way).  Can you check

Separately, I'll open a JIRA about off-heap data structures which will be a little longer-term.

> Namenode memory optimization - Block replicas list 
> ---------------------------------------------------
>                 Key: HDFS-6658
>                 URL: https://issues.apache.org/jira/browse/HDFS-6658
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.4.1
>            Reporter: Amir Langer
>            Assignee: Amir Langer
>         Attachments: Namenode Memory Optimizations - Block replicas list.docx
> Part of the memory consumed by every BlockInfo object in the Namenode is a linked list
of block references for every DatanodeStorageInfo (called "triplets"). 
> We propose to change the way we store the list in memory. 
> Using primitive integer indexes instead of object references will reduce the memory needed
for every block replica (when compressed oops is disabled) and in our new design the list
overhead will be per DatanodeStorageInfo and not per block replica.
> see attached design doc. for details and evaluation results.

This message was sent by Atlassian JIRA

View raw message