hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmytro Molkov (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1071) savenamespace should write the fsimage to all configured fs.name.dir in parallel
Date Wed, 23 Jun 2010 00:42:52 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881489#action_12881489
] 

Dmytro Molkov commented on HDFS-1071:
-------------------------------------

Well, what I mean by the parent thread holding the lock is the following:

the saveNamespace method is synchronized in the FSNamesystem and currently while holding this
lock, the handler thread walks the tree N times and writes N files, so in a way we assume
that the tree is guarded from all the modifications by the FSNamesystem lock.

The same is true for the patch, except in this case we are walking the tree by N different
threads. But operating under the same assumptions that while we are holding the FSNamesystem
lock the tree is not being modified, and the handler thread is waiting for all worker threads
to finish writing to their files before returning from the section synchronized on FSNamesystem.

We just deployed this patch internally to our production cluster:

2010-06-22 10:12:59,714 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size
11906663754 saved in 140 seconds.
2010-06-22 10:13:50,626 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size
11906663754 saved in 191 seconds.

This saved us 140 seconds on the current image.

As far as both copies being on the same drive is concerned - I guess this patch will not give
much of an improvement.
However I am not sure there is much value in storing two copies of the image on the same drive?
Please correct me if I am wrong, but I thought that multiple copies of the image should theoretically
be stored on different drives to help in case of drive failure (or on a filer to protect against
machine dying), and storing two copies on the same drive only helps with file corruption (accidental
deletion) and that is a weak argument to have multiple copies on one physical drive?

I like your approach with one thread doing serialization and others doing writes, but it seems
like it is a lot more complicated than the one in this patch.
Because I am simply executing one call in a new born thread, while with serializer-writer
approach there will be more implementation questions, like what to do with multiple writers
that consume their queues at different speeds. You cannot grow the queue indefinitely, since
the namenode will simply run out of memory, on the other hand you might want to write things
out to faster consumers as quickly as possible.
And the main benefit I see is only doing serialization of a tree once, but since we are holding
the FSNamesystem lock at that time the NameNode doesn't do much anyways, it is also not worse
than what was in place before that (serialization was taking place once per image location).

> savenamespace should write the fsimage to all configured fs.name.dir in parallel
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-1071
>                 URL: https://issues.apache.org/jira/browse/HDFS-1071
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: HDFS-1071.2.patch, HDFS-1071.3.patch, HDFS-1071.4.patch, HDFS-1071.5.patch,
HDFS-1071.patch
>
>
> If you have a large number of files in HDFS, the fsimage file is very big. When the namenode
restarts, it writes a copy of the fsimage to all directories configured in fs.name.dir. This
takes a long time, especially if there are many directories in fs.name.dir. Make the NN write
the fsimage to all these directories in parallel.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message