hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen O'Donnell (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-14637) Namenode may not replicate blocks to meet the policy after enabling upgradeDomain
Date Mon, 08 Jul 2019 19:29:00 GMT
Stephen O'Donnell created HDFS-14637:

             Summary: Namenode may not replicate blocks to meet the policy after enabling
                 Key: HDFS-14637
                 URL: https://issues.apache.org/jira/browse/HDFS-14637
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 3.3.0
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell

After changing the network topology or placement policy on a cluster and restarting the namenode,
the namenode will scan all blocks on the cluster at startup, and check if they meet the current
placement policy. If they do not, they are added to the replication queue and the namenode
will arrange for them to be replicated to ensure the placement policy is used.

If you start with a cluster with no UpgradeDomain, and then enable UpgradeDomain, then on
restart the NN does notice all the blocks violate the placement policy and it adds them to
the replication queue. I believe there are some issues in the logic that prevents the blocks
from replicating depending on the setup:

With UD enabled, but no racks configured, and possible on a 2 rack cluster, the queued replication
work never makes any progress, as in blockManager.validateReconstructionWork(), it checks
to see if the new replica increases the number of racks, and if it does not, it skips it and
tries again later.
DatanodeStorageInfo[] targets = rw.getTargets();
if ((numReplicas.liveReplicas() >= requiredRedundancy) &&
    (!isPlacementPolicySatisfied(block)) ) {
  if (!isInNewRack(rw.getSrcNodes(), targets[0].getDatanodeDescriptor())) {
    // No use continuing, unless a new rack in this case
    return false;
  // mark that the reconstruction work is to replicate internal block to a
  // new rack.

Additionally, in blockManager.scheduleReconstruction() is there some logic that sets the number
of new replicas required to one, if the live replicas >= requiredReduncancy:
int additionalReplRequired;
if (numReplicas.liveReplicas() < requiredRedundancy) {
  additionalReplRequired = requiredRedundancy - numReplicas.liveReplicas()
      - pendingNum;
} else {
  additionalReplRequired = 1; // Needed on a new rack
With UD, it is possible for 2 new replicas to be needed to meet the block placement policy,
if all existing replicas are on a node with the same domain. For traditional '2 rack redundancy',
only 1 new replica would ever have been needed in this scenario.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message