hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1443) Improve Datanode startup time
Date Thu, 07 Oct 2010 21:18:31 GMT
Improve Datanode startup time

                 Key: HDFS-1443
                 URL: https://issues.apache.org/jira/browse/HDFS-1443
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: data-node
    Affects Versions: 0.20.2
            Reporter: Matt Foley
            Assignee: Matt Foley
             Fix For: 0.22.0

One of the factors slowing down cluster restart is the startup time for the Datanodes.  In
particular, if Upgrade is needed, the Datanodes must do a Snapshot and this can take 5-15
minutes per volume, serially.  Thus, for a 4-disk datanode, it may be 45 minutes before it
is ready to send its initial Block Report to the Namenode.  This is an umbrella bug for the
following four pieces of work to improve Datanode startup time:

1. Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory
instead of once per file.  This is the biggest villain, responsible for 90% of that 45 minute
delay.  See subordinate bug for details.

2. Refactor Upgrade process in DataStorage to run volume-parallel.  There is already a bug
open for this, HDFS-270, and the volume-parallel work in DirectoryScanner from HDFS-854 is
a good foundation to build on.

3. Refactor the FSDir() and getVolumeMap() call chains in FSDataset, so they share data and
run volume-parallel.  Currently the two constructors for in-memory directory tree and replicas
map run THREE full scans of the entire disk - once in FSDir(), once in recoverTempUnlinkedBlock(),
and once in addToReplicasMap().  During each scan, a new File object is created for each of
the 100,000 or so items in the native file system (for a 50,000-block node).  This impacts
GC as well as disk traffic.

4. Make getGenerationStampFromFile() more efficient.  Currently this routine is called by
addToReplicasMap() for every blockfile in the directory tree, and it does a full listing of
each file's containing directory on every call.  This is the equivalent of doing lots MORE
full disk scans.  The underlying disk i/o buffers probably prevent disk thrashing, but we
are still creating bazillions of unnecessary File objects that need to be GC'ed.  There is
a simple refactoring that prevents this.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message