hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7784) load fsimage in parallel
Date Sat, 14 Feb 2015 00:56:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321075#comment-14321075

Kai Zheng commented on HDFS-7784:

Hi [~walter.k.su],

It's interesting, thanks !
bq.So I changed the priority to minor
I don't think it's minor. It does make sense. I thought it's a good discussion.
bq.One thing we might consider is a two-thread system, where one thread does deserialization
and puts the results into a BlockingQueue read by the other FSN loading thread. 
I thought it's a good idea. We might consider it as well and have a try ?

So we have the current approach, the parallel approach proposed here, and the above one suggested
by [~cmccabe]. Is it possible to enhance and allow to plugin the fsimage loading approach
? By default it will use the current method.

> load fsimage in parallel
> ------------------------
>                 Key: HDFS-7784
>                 URL: https://issues.apache.org/jira/browse/HDFS-7784
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Walter Su
>            Assignee: Walter Su
>            Priority: Minor
>         Attachments: HDFS-7784.001.patch, test-20150213.pdf
> When single Namenode has huge amount of files, without using federation, the startup/restart
speed is slow. The fsimage loading step takes the most of the time. fsimage loading can seperate
to two parts, deserialization and object construction(mostly map insertion). Deserialization
takes the most of CPU time. So we can do deserialization in parallel, and add to hashmap in
serial.  It will significantly reduce the NN start time.

This message was sent by Atlassian JIRA

View raw message