hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Su (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7784) load fsimage in parallel
Date Fri, 13 Feb 2015 15:01:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320201#comment-14320201

Walter Su commented on HDFS-7784:

I agree with you. A single Namenode with 64GB memory can hold about 100m files(maybe a little
more). In this situation, The startup time drops from 371s to 159s and it's not good enough.
Usually we don't restart Namenode often. So I think it's ok we wait another 2 minutes for
If people store 10x or 100x more than 100m files, they should consider federation.
So I changed the priority to minor, and still I'll upload the patch, Maybe it'll help someone.

> load fsimage in parallel
> ------------------------
>                 Key: HDFS-7784
>                 URL: https://issues.apache.org/jira/browse/HDFS-7784
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Walter Su
>            Assignee: Walter Su
>            Priority: Minor
>         Attachments: HDFS-7784.001.patch, test-20150213.pdf
> When single Namenode has huge amount of files, without using federation, the startup/restart
speed is slow. The fsimage loading step takes the most of the time. fsimage loading can seperate
to two parts, deserialization and object construction(mostly map insertion). Deserialization
takes the most of CPU time. So we can do deserialization in parallel, and add to hashmap in
serial.  It will significantly reduce the NN start time.

This message was sent by Atlassian JIRA

View raw message