hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1070) Speedup NameNode image loading and saving by storing local file names
Date Thu, 07 Apr 2011 20:30:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017129#comment-13017129
] 

Matt Foley commented on HDFS-1070:
----------------------------------

I reviewed the new patch.  I focused on the differences and changes, rather than doing a detailed
read-through again, but it looks generally fine.  Comments (all in FSImageFormat):

loadLocalNameINodes() javadoc says returns number of inodes read, but returns void.

Suggest merging loadDirectories() into its caller (loadLocalNameINodes()), so the file counting
logic is in one place.

"direcotry" is misspelled in comment in body of saveImage

Optional: FSImageFormat.saveImage() is confusing.  It currently has three variables prefixLength,
prefixLen, and newPrefixLength.  (I think two of the three were inherited from the old code.)
 I understand that the point of the manipulations is to special-case the "root directory"
directory name so it doesn't generate a "//" at the beginning of paths.  But I would suggest
something like the following instead:

{code}
    private static void saveImage(ByteBuffer currentDirName,
                                  INodeDirectory current,
                                  DataOutputStream out) throws IOException {
      if (current.getChildrenRaw() == null)
        return;
      // print prefix (parent directory name)
      int prefixLength = currentDirName.position();
      if (prefixLength == 0) {
        //special-case the root directory name
        out.writeShort(PATH_SEPARATOR.length);
        out.write(PATH_SEPARATOR, 0, PATH_SEPARATOR.length);
      } else {
        //all non-root directories
        out.writeShort(prefixLength);
        out.write(currentDirName.array(), 0, prefixLength);
      }
      List<INode> children = current.getChildren();
      out.writeInt(children.size());
      for(INode child : children) {
        // print all children first
        FSImageSerialization.saveINode2Image(child, out);
      }
      for(INode child : children) {
        if(!child.isDirectory() || ((INodeDirectory)child).getChildrenRaw() == null)
          continue;
        // append the subdirectory name to path buffer
        currentDirName.put(PATH_SEPARATOR).put(child.getLocalNameBytes());
        // save the subdirectory
        saveImage(currentDirName, (INodeDirectory)child, out);
        // restore the path buffer
        currentDirName.position(prefixLength);
      }
    }

(This change requires in save() changing the line "strbuf.put(PATH_SEPARATOR);" 
to "strbuf.position(0);", as well as fixing the call signature for saveImage() 
call.)

{code}

While it isn't great to do a conditional every pass through the method, it is a fair trade
for saving a bunch of assignments and an additional arg passed through every call.  And it's
very clear and maintainable.

Performance test results in the next comment.

> Speedup NameNode image loading and saving by storing local file names
> ---------------------------------------------------------------------
>
>                 Key: HDFS-1070
>                 URL: https://issues.apache.org/jira/browse/HDFS-1070
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: trunkLocalNameImage.patch, trunkLocalNameImage1.patch, trunkLocalNameImage3.patch,
trunkLocalNameImage4.patch, trunkLocalNameImage5.patch, trunkLocalNameImage6.patch
>
>
> Currently each inode stores its full path in the fsimage. I'd propose to store the local
name instead. In order for each inode to identify its parent, all inodes in a directory tree
are stored in the image in in-order. This proposal also requires each directory stores the
number of its children in image.
> This proposal would bring a few benefits as pointed below and therefore speedup the image
loading and saving.
> # Remove the overhead of converting java-UTF8 encoded local name to string-represented
full path then to UTF8 encoded full path when saving to an image and vice versa when loading
the image.
> # Remove the overhead of traversing the full path when inserting the inode to its parent
inode.
> # Reduce the number of temporary java objects during the process of image loading or
saving and  therefore reduce the GC overhead.
> # Reduce the size of an image.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message