hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8586) Dead Datanode is allocated for write when client is from deadnode
Date Tue, 23 Jun 2015 04:43:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597156#comment-14597156
] 

Vinayakumar B commented on HDFS-8586:
-------------------------------------

Thanks [~brahmareddy] for reporting this.
This will come, if the NameNode have the list of deadnodes, and block allocation request comes
from the same machine as of DeadNode, then dead node is being chosen as localnode irrespective
of whether its part of the cluster or not. Adding one check in {{BlockPlacementPolicyDefault.java#choseLocalStorage(..)}}
will be the fix for this.

Regarding the test proposed above, it will not fail always, since its a minidfscluster test,
and all datanodes will be on the same machine And Probabiity of deadnode being chosen as localstorage
is not guaranteed.

> Dead Datanode is allocated for write when client is  from deadnode
> ------------------------------------------------------------------
>
>                 Key: HDFS-8586
>                 URL: https://issues.apache.org/jira/browse/HDFS-8586
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>            Priority: Critical
>
>  *{color:blue}DataNode marked as Dead{color}* 
> 2015-06-11 19:39:00,862 | INFO  | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e
| BLOCK*  *removeDeadDatanode: lost heartbeat from XX.XX.39.33:25009*  | org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:584)
> 2015-06-11 19:39:00,863 | INFO  | org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@28ec166e
| Removing a node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.remove(NetworkTopology.java:488)
>   *{color:blue}Deadnode got Allocated{color}* 
> 2015-06-11 19:39:45,148 | WARN  | IPC Server handler 26 on 25000 | The cluster does not
contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
> 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The cluster does not
contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
> 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The cluster does not
contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
> 2015-06-11 19:39:45,149 | WARN  | IPC Server handler 26 on 25000 | The cluster does not
contain node: /default/rack3/XX.XX.39.33:25009 | org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:616)
> 2015-06-11 19:39:45,149 | INFO  | IPC Server handler 26 on 25000 | BLOCK*  *allocate
blk_1073754030_13252* {UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1,
replicas=[ReplicaUC[[DISK]DS-e8d29773-dfc2-4224-b1d6-9b0588bca55e:NORMAL:{color:red}XX.XX.39.33:25009{color}|RBW],
 ReplicaUC[[DISK]DS-f7d2ab3c-88f7-470c-9097-84387c0bec83:NORMAL:XX.XX.38.32:25009|RBW], ReplicaUC[[DISK]DS-8c2a464a-ac81-4651-890a-dbfd07ddd95f:NORMAL:
*XX.XX.38.33:25009|RBW]]* } for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657)
> 2015-06-11 19:39:45,191 | INFO  | IPC Server handler 35 on 25000 | BLOCK* allocate blk_1073754031_13253{UCState=UNDER_CONSTRUCTION,
truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-ed8ad579-50c0-4e3e-8780-9776531763b6:NORMAL:XX.XX.39.31:25009|RBW],
ReplicaUC[[DISK]DS-19ddd6da-4a3e-481a-8445-dde5c90aaff3:NORMAL:XX.XX.37.32:25009|RBW], ReplicaUC[[DISK]DS-4ce4ce39-4973-42ce-8c7d-cb41f899db85:
{{NORMAL:XX.XX.37.33:25009}}   |RBW]]} for /t1._COPYING_ | org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3657)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message