hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-1366) reduce namenode startup time by optimising checkBlockInfo while loading fsimage
Date Tue, 05 Jul 2011 23:00:16 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matt Foley updated HDFS-1366:
-----------------------------

    Attachment: FSImageRead_shortcut_proto.patch

The code has changed since this ticket was opened.  In March I did some experiments, and at
that time there was no longer a BlocksMap.checkBlockInfo() method, and the call sequence was:
{code}
FSImage.loadFSImage()
  FSImageFormat.Loader.load()
    FSImageFormat.Loader.loadFullNameINodes()
      FSDirectory.addToParent()
        BlockManager.addINode()
          BlocksMap.addINode()
{code}

BlocksMap.addINode() did this:
{code}
  BlockInfo addINode(BlockInfo b, INodeFile iNode) {
    BlockInfo info = blocks.get(b);
    if (info != b) {
      info = b;
      blocks.put(info);
    }
    info.setINode(iNode);
    return info;
  }
{code}
which could be replaced by
{code}
  BlockInfo addINode(BlockInfo b, INodeFile iNode) {
    blocks.put(b);
    b.setINode(iNode);
    return b;
  }
{code}
Calling blocks.get() before conditionally calling blocks.put() in this way is a waste regardless
of whether we are reading the FSImage or calling addINode() for any other purpose, because
the cost of put and get are about the same, and the result of just calling put is identical
to the above code. However, I put this into a simple proof-of-principle patch (attached -
not ready for prime time) and tried it. I only got a 6% improvement in FSImage load time.


> reduce namenode startup time by optimising checkBlockInfo while loading  fsimage 
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-1366
>                 URL: https://issues.apache.org/jira/browse/HDFS-1366
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>         Attachments: FSImageRead_shortcut_proto.patch
>
>
> The namenode spends about 10 minutes reading in a 14 GB fsimage file into memory and
creating all the in-memory data structures. A jstack based debugger clearly shows that most
of the time during the fsimage load is spent in BlocksMap.checkBlockInfo. There is a easy
way to optimize this method especially for this code path.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message