hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10320) Rack failures may result in NN terminate
Date Wed, 04 May 2016 18:03:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271102#comment-15271102
] 

Ming Ma commented on HDFS-10320:
--------------------------------

bq. for subclasses
Good catch. Maybe rename the method or add more comments there.

Another thing, in NetworkTopology.java's {{chooseRandom(final String scope, String excludedScope,
      final Collection<Node> excludedNodes)}}, there is an existing {{numOfDatanodes}},
can it take excludes into account or do you still need to recompute another {{availableNodes}}?

> Rack failures may result in NN terminate
> ----------------------------------------
>
>                 Key: HDFS-10320
>                 URL: https://issues.apache.org/jira/browse/HDFS-10320
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: HDFS-10320.01.patch, HDFS-10320.02.patch, HDFS-10320.03.patch, HDFS-10320.04.patch,
HDFS-10320.05.patch
>
>
> If there're rack failures which end up leaving only 1 rack available, {{BlockPlacementPolicyDefault#chooseRandom}}
may get {{InvalidTopologyException}} when calling {{NetworkTopology#chooseRandom}}, which
then throws all the way out to {{BlockManager}}'s {{ReplicationMonitor}} thread and terminate
the NN.
> Log:
> {noformat}
> 2016-02-24 09:22:01,514  WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy:
Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true)
For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2016-02-24 09:22:01,958  ERROR org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
ReplicationMonitor thread received Runtime exception. 
> org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Failed to find datanode
(scope="" excludedScope="/rack_a5").
> 	at org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:729)
> 	at org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:694)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:635)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:580)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:348)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:214)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:111)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:3746)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$200(BlockManager.java:3711)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1400)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1306)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3682)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3634)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message