hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakob Homan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1071) savenamespace should write the fsimage to all configured fs.name.dir in parallel
Date Mon, 25 Oct 2010 23:18:22 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924775#action_12924775
] 

Jakob Homan commented on HDFS-1071:
-----------------------------------

bq. Could you please verify this. If the images are the same I'm fine with the implementation.
In the patch, the {{FSNameSystem::saveNamespace()}} acquires the writelock before calling
{{FSImage::saveNamespace(renewCheckpointTime)}}.  The writing is done in parallel and each
of the writer threads is joined (in {{waitForThreads}}) before returning from the method,
where the writeLock is surrendered.  So this should be safe

There are other calls to {{saveNamespace}} that should be considered, though.   {{FSImage::saveNamespace(renewCheckpointTime)}}
is called from several other locations: In {{FSDirectory::loadFSImage}}, which is called by
FSNameSystem's constructors, by {{BackupStorage::saveCheckpoint()}}, by {{CheckpointStorage::doMerge()}},
and by {{FSImage::doImportCheckpoint}}.  Assuming no new operations are coming in, which they
shouldn't be, the checkpoint and backupnode calls are safe.  The others are as well, assuming
we're in safemode.  Does this sound reasonable?

I believe this addresses Konstantin's concerns.

A couple nits with the current patch (6):
* Java's Collections documentation is pretty adamant about traversing synchronized collections
with a lock on the collection (http://download.oracle.com/javase/6/docs/api/java/util/Collections.html#synchronizedList(java.util.List)),
which isn't done currently in the patch in {{processIOErrors}} for the {{sds}} parameter.
 This isn't necessary at the moment, as only one thread is guaranteed to be iterating, but
it may be better to synchronize now to avoid problems in the future.
* The MiniDFSCluster constructors have been deprecated since this patch was generated.  It
should be updated to use the new Builder.
 

> savenamespace should write the fsimage to all configured fs.name.dir in parallel
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-1071
>                 URL: https://issues.apache.org/jira/browse/HDFS-1071
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: HDFS-1071.2.patch, HDFS-1071.3.patch, HDFS-1071.4.patch, HDFS-1071.5.patch,
HDFS-1071.6.patch, HDFS-1071.patch
>
>
> If you have a large number of files in HDFS, the fsimage file is very big. When the namenode
restarts, it writes a copy of the fsimage to all directories configured in fs.name.dir. This
takes a long time, especially if there are many directories in fs.name.dir. Make the NN write
the fsimage to all these directories in parallel.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message