hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binglin Chang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
Date Thu, 18 Dec 2014 05:42:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251244#comment-14251244
] 

Binglin Chang commented on HDFS-7527:
-------------------------------------

Make sense, looks like the behavior is changed at some point. 
Update the patch to partially support dfs.datanode.hostname(if it is an ip address, or the
hostname resolve to a proper ip address). 
And add change to test to properly wait for the excluded datanode become back again(using
Datanode.isDatanodeFullyStarted rather than checking ALIVE node count).
Note that too fully restore the old behavior requires a lot more changes, currently I only
made minimal changes.

> TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-7527
>                 URL: https://issues.apache.org/jira/browse/HDFS-7527
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, test
>            Reporter: Yongjun Zhang
>            Assignee: Binglin Chang
>         Attachments: HDFS-7527.001.patch
>
>
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
> {quote}
> Error Message
> test timed out after 360000 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 360000 milliseconds
> 	at java.lang.Thread.sleep(Native Method)
> 	at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
> 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization
failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service
to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host
is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279,
infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
> 	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
> 	at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
> 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization
failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service
to localhost/127.0.0.1:40565. Exiting. 
> java.io.IOException: DN shut down before block pool connected
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
> 	at java.lang.Thread.run(Thread.java:745)
> {quote}
> Found by tool proposed in HADOOP-11045:
> {quote}
> [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk
-n 5 | tee bt.log
> ****Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk
>     THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed
below:
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01)
>     Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
>     Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
>     Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27)
>     Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01)
>     Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
> ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01)
>     Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
>     Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
>     Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
> Among 6 runs examined, all failed tests <#failedRuns: testName>:
>     3: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
>     2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
>     2: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
>     1: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message