hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8995) Flaw in registration bookeeping can make DN die on reconnect
Date Wed, 02 Sep 2015 01:41:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726562#comment-14726562
] 

Hudson commented on HDFS-8995:
------------------------------

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #339 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/339/])
HDFS-8995. Flaw in registration bookeeping can make DN die on reconnect. (Kihwal Lee via yliu)
(yliu: rev 5652131d2ea68c408dd3cd8bee31723642a8cdde)
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Flaw in registration bookeeping can make DN die on reconnect
> ------------------------------------------------------------
>
>                 Key: HDFS-8995
>                 URL: https://issues.apache.org/jira/browse/HDFS-8995
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>             Fix For: 2.7.2
>
>         Attachments: HDFS-8995.patch
>
>
> Normally data nodes re-register with the namenode when it was unreachable for more than
the heartbeat expiration and becomes reachable again. Datanodes keep retrying the last rpc
call such as incremental block report and heartbeat and when it finally gets through the namenode
tells it to re-register.
> We have observed that some of datanodes stay dead in such scenarios. Further investigation
has revealed that those were told to shutdown by the namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message