hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-1851) HDFS-15 scalability improvements
Date Thu, 21 Apr 2011 03:24:05 GMT
HDFS-15 scalability improvements
--------------------------------

                 Key: HDFS-1851
                 URL: https://issues.apache.org/jira/browse/HDFS-1851
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: name-node
    Affects Versions: 0.21.1, 0.22.0, 0.23.0
            Reporter: Eli Collins


While discussing HDFS-15 with Todd he made the observation that BlockManager#isReplicationNeeded
is expense at scale because it iterates over the list of all datanodes. If this method is
in fact called frequently enough to be measurable, there are some places we can probably help.


There are a couple of places where we can remove calls to isReplicationNeeded:
* In processPendingReplications, we can unconditionally add the block to neededReplications
(w/o checking isReplicationNeeded) because a block wouldn't be in pendingReplications (which
is where the timed out blocks come from) if it didn't need replication.
* BlockManager#blockHasEnoughRacks could be modified to only call getNumberOfRacks conditionally
for the replFactor=1 case.

There are some simple improvements we could make to the method itself, eg:
* Allocate the hash set with an initial size of 2 rather than 0 because we'd expect a block
to be available on 2 racks. ArrayList(3) might be a better option here since n is small. 
* It also doesn't need to check rackSet.contains before calling rackSet.add since rackSet
is a set. 
* Keep a count numUniqueRacks (if (rackSet.add(name)) numUniqueRacks++) rather than call Set#size.

A couple other things I noticed while looking at the code:
* The logic in isReplicationInProgress that checks curReplicas < curExpectedReplicas right
after isNeededReplication (a more stringent check) needs validating/a comment. I think the
intent is to only count the block as under replicated only if it has an insufficient total
# replicas (vs not being on enough racks) because we do not want to prevent the given DN from
being decomissioned just because its blocks are not replicated across enough racks (which
may not even be possible if there's 1 rack and a topology script is configured). Ie a DN could
never be comissioned if a topology script is enabled and there's just 1 rack. Needs a test.
* A better name for shouldCheckForEnoughRacks would be isMultiRack, and blockNeedsReplication
instead of isNeededReplication
* We should warn if a topology script is configured and only 1 rack is used, because blocks
will never leave neededReplications in this case all blocks will always stay in the neededReplications
queue, which is correct but may slow things down or consume substantial much memory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message