hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmytro Molkov (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-1071) savenamespace should write the fsimage to all configured fs.name.dir in parallel
Date Tue, 01 Jun 2010 20:48:40 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Dmytro Molkov updated HDFS-1071:

    Attachment: HDFS-1071.4.patch

I added the test that also does saveNamespace on the running cluster and then checks the image
files written.
As far as the locking is concerned saveNamespace can only be done when in safemode, so we
are only performing read operations on the datastructure that is essentially read only and
while the parent thread is holding a lock, right?

And for the performance: the current image in our biggest cluster is ~11G and it takes 1.5-2
minutes to write it out to disk and filer each. In case of parallel writes those latencies
are completely overlayed, so it will take 1.5-2 minutes for both. Which will give us about
1.5 minutes savings (80-100 seconds is the time of the faster write).

> savenamespace should write the fsimage to all configured fs.name.dir in parallel
> --------------------------------------------------------------------------------
>                 Key: HDFS-1071
>                 URL: https://issues.apache.org/jira/browse/HDFS-1071
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: HDFS-1071.2.patch, HDFS-1071.3.patch, HDFS-1071.4.patch, HDFS-1071.patch
> If you have a large number of files in HDFS, the fsimage file is very big. When the namenode
restarts, it writes a copy of the fsimage to all directories configured in fs.name.dir. This
takes a long time, especially if there are many directories in fs.name.dir. Make the NN write
the fsimage to all these directories in parallel.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message