hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10320) Rack failures may result in NN terminate
Date Wed, 04 May 2016 19:09:13 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiao Chen updated HDFS-10320:
-----------------------------
    Attachment: HDFS-10320.06.patch

bq. Maybe rename the method or add more comments there.
Sure, I updated the original comment before that {{addToExcludedNodes}} call.
bq. {{numOfDatanodes}} v.s. {{availableNodes}} in NetworkTopology.java's chooseRandom
This is the fun part. :) They're different things. The implementation of {{InnerNode#getNumOfLeaves}}
is to return the total leaves, and the 'randomly choose 1' is done by {{innerNode.getLeaf(leaveIndex,
node)}}, providing an (randomly generated) index, and the (most ancestor) node from {{excludedScope}}.
I checked all the way in for feasibility of adding {{excludedNodes}} to {{getLeaf}} when coming
up with patch 3, but decided to have current implementation for 2 reasons:
- Less change. We don't have to change all the way into {{InnerNode}} for this bug fix, hence
less effort.
- It is more consistent with current behavior. Currently we loop in BPPD, if we get a node
that's already excluded, we call {{chooseDataNode}} again. This patch simply moves this loop
inside.

1 implementation detail I also considered is, in NetworkTopology.java's chooseRandom, without
changing {{InnerNode}}, we could maintain the index mapping of available nodes, and randomly
choose the index from the mapping, then get the node using the index. If this node is in excludeNodes,
we remove that index from the mapping. Although this would make the loop run less iterations
(since each time a different node will be coming from the set), for HDFS with enormous number
of DNs, the space consumption and the overhead of setting up the index mapping overrules the
benefit. I assume this is why we have that simple loop in BPPD at first.

Please let me know what you think. Thanks!

> Rack failures may result in NN terminate
> ----------------------------------------
>
>                 Key: HDFS-10320
>                 URL: https://issues.apache.org/jira/browse/HDFS-10320
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: HDFS-10320.01.patch, HDFS-10320.02.patch, HDFS-10320.03.patch, HDFS-10320.04.patch,
HDFS-10320.05.patch, HDFS-10320.06.patch
>
>
> If there're rack failures which end up leaving only 1 rack available, {{BlockPlacementPolicyDefault#chooseRandom}}
may get {{InvalidTopologyException}} when calling {{NetworkTopology#chooseRandom}}, which
then throws all the way out to {{BlockManager}}'s {{ReplicationMonitor}} thread and terminate
the NN.
> Log:
> {noformat}
> 2016-02-24 09:22:01,514  WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy:
Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true)
For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2016-02-24 09:22:01,958  ERROR org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
ReplicationMonitor thread received Runtime exception. 
> org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Failed to find datanode
(scope="" excludedScope="/rack_a5").
> 	at org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:729)
> 	at org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:694)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:635)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:580)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:348)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:214)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:111)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:3746)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$200(BlockManager.java:3711)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1400)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1306)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3682)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3634)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message