hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1070) Speedup NameNode image loading and saving by storing local file names
Date Fri, 01 Apr 2011 01:50:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014348#comment-13014348
] 

Matt Foley commented on HDFS-1070:
----------------------------------

I was talking with Sanjay and he would like us to consider making the serialized tree format
more robust.  With the old "full names" format, an arbitrary chunk of the file could be lost
or corrupted, and the remainder would still be recoverable.  With this tree format, if a chunk
is lost or corrupted, everything thereafter is difficult or impossible to reconstruct because
the tree information is all implicit in the ordering.

Things that would help are
* making more of the tree-structure information explicit
* providing re-synchronization points or labels in the format

The simplest idea that occurred to me (and I haven't had time to discuss it with Sanjay yet)
is to go back to a breadth-first ordering so that the members of each directory are grouped
together, and output the full-path name of the directory at the beginning of each such group.
 The full-path string provides both the structure info and the re-synch capability, just as
the full-names do now.  Maybe we can avoid actually parsing that string in most cases.  This
would be going back to something like the old saveImage(), but only output the local name
with each inode, and output the full-path name once when beginning each new breadth-first
directory.

This will decrease the benefit of the change, but if we believe that number of files is typically
much larger than number of directories, there could still be a big benefit.  What do you think?
 Would it be hard to re-run your experiment with this mod to see the cost of this change?

> Speedup NameNode image loading and saving by storing local file names
> ---------------------------------------------------------------------
>
>                 Key: HDFS-1070
>                 URL: https://issues.apache.org/jira/browse/HDFS-1070
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: trunkLocalNameImage.patch, trunkLocalNameImage1.patch, trunkLocalNameImage3.patch,
trunkLocalNameImage4.patch, trunkLocalNameImage5.patch
>
>
> Currently each inode stores its full path in the fsimage. I'd propose to store the local
name instead. In order for each inode to identify its parent, all inodes in a directory tree
are stored in the image in in-order. This proposal also requires each directory stores the
number of its children in image.
> This proposal would bring a few benefits as pointed below and therefore speedup the image
loading and saving.
> # Remove the overhead of converting java-UTF8 encoded local name to string-represented
full path then to UTF8 encoded full path when saving to an image and vice versa when loading
the image.
> # Remove the overhead of traversing the full path when inserting the inode to its parent
inode.
> # Reduce the number of temporary java objects during the process of image loading or
saving and  therefore reduce the GC overhead.
> # Reduce the size of an image.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message