hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T Meyarivan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-1480) All replicas for a block with repl=2 end up in same rack
Date Wed, 23 Mar 2011 19:20:06 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

T Meyarivan updated HDFS-1480:
------------------------------

    Description: 
It appears that all replicas of a block can end up in the same rack. The likelihood of such
replicas seems to be directly related to decommissioning of nodes. 

Post rolling OS upgrade (decommission 3-10% of nodes, re-install etc, add them back) of a
running cluster, all replicas of about 0.16% of blocks ended up in the same rack.

Hadoop Namenode UI etc doesn't seem to know about such incorrectly replicated blocks. "hadoop
fsck .." does report that the blocks must be replicated on additional racks.

Looking at ReplicationTargetChooser.java, following seem suspect:

snippet-01:
{code}
    int maxNodesPerRack =
      (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2;
{code}

snippet-02:
{code}
      case 2:
        if (clusterMap.isOnSameRack(results.get(0), results.get(1))) {
          chooseRemoteRack(1, results.get(0), excludedNodes,
                           blocksize, maxNodesPerRack, results);
        } else if (newBlock){
          chooseLocalRack(results.get(1), excludedNodes, blocksize,
                          maxNodesPerRack, results);
        } else {
          chooseLocalRack(writer, excludedNodes, blocksize,
                          maxNodesPerRack, results);
        }
        if (--numOfReplicas == 0) {
          break;
        }
{code}

snippet-03:
{code}
    do {
      DatanodeDescriptor[] selectedNodes =
        chooseRandom(1, nodes, excludedNodes);
      if (selectedNodes.length == 0) {
        throw new NotEnoughReplicasException(
                                             "Not able to place enough replicas");
      }
      result = (DatanodeDescriptor)(selectedNodes[0]);
    } while(!isGoodTarget(result, blocksize, maxNodesPerRack, results));
{code}

  was:
It appears that all replicas of a block can end up in the same rack. The likelihood of such
replicas seems to be directly related to decommissioning of nodes. 

Post rolling OS upgrade (decommission 3-10% of nodes, re-install etc, add them back) of a
running cluster, all replicas of about 0.16% of blocks ended up in the same rack.

Hadoop Namenode UI etc doesn't seem to know about such incorrectly replicated blocks. "hadoop
fsck .." does report that the blocks must be replicated on additional racks.

Looking at ReplicationTargetChooser.java, following seem suspect:

snippet-01:
{code}
    int maxNodesPerRack =
      (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2;
{code}

snippet-02:
{code}
    if (counter>maxTargetPerLoc) {
      logr.debug("Node "+NodeBase.getPath(node)+
                " is not chosen because the rack has too many chosen nodes");
      return false;
    }
{code}

snippet-03:
{code}
      default:
        chooseRandom(numOfReplicas, NodeBase.ROOT, excludedNodes,
                     blocksize, maxNodesPerRack, results);
      }
{code}

        Summary: All replicas for a block with repl=2 end up in same rack  (was: All replicas
for a block end up in same rack)

> All replicas for a block with repl=2 end up in same rack
> --------------------------------------------------------
>
>                 Key: HDFS-1480
>                 URL: https://issues.apache.org/jira/browse/HDFS-1480
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.2
>            Reporter: T Meyarivan
>
> It appears that all replicas of a block can end up in the same rack. The likelihood of
such replicas seems to be directly related to decommissioning of nodes. 
> Post rolling OS upgrade (decommission 3-10% of nodes, re-install etc, add them back)
of a running cluster, all replicas of about 0.16% of blocks ended up in the same rack.
> Hadoop Namenode UI etc doesn't seem to know about such incorrectly replicated blocks.
"hadoop fsck .." does report that the blocks must be replicated on additional racks.
> Looking at ReplicationTargetChooser.java, following seem suspect:
> snippet-01:
> {code}
>     int maxNodesPerRack =
>       (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2;
> {code}
> snippet-02:
> {code}
>       case 2:
>         if (clusterMap.isOnSameRack(results.get(0), results.get(1))) {
>           chooseRemoteRack(1, results.get(0), excludedNodes,
>                            blocksize, maxNodesPerRack, results);
>         } else if (newBlock){
>           chooseLocalRack(results.get(1), excludedNodes, blocksize,
>                           maxNodesPerRack, results);
>         } else {
>           chooseLocalRack(writer, excludedNodes, blocksize,
>                           maxNodesPerRack, results);
>         }
>         if (--numOfReplicas == 0) {
>           break;
>         }
> {code}
> snippet-03:
> {code}
>     do {
>       DatanodeDescriptor[] selectedNodes =
>         chooseRandom(1, nodes, excludedNodes);
>       if (selectedNodes.length == 0) {
>         throw new NotEnoughReplicasException(
>                                              "Not able to place enough replicas");
>       }
>       result = (DatanodeDescriptor)(selectedNodes[0]);
>     } while(!isGoodTarget(result, blocksize, maxNodesPerRack, results));
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message