hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ayush Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13927) Improve TestDataNodeMultipleRegistrations#testDNWithInvalidStorageWithHA wait
Date Wed, 26 Sep 2018 04:42:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628256#comment-16628256
] 

Ayush Saxena commented on HDFS-13927:
-------------------------------------

{noformat}
2018-09-25 06:00:28,162 [IPC Server listener on 46808] INFO  ipc.Server (Server.java:run(1153))
- IPC Server listener on 46808: starting
2018-09-25 06:00:28,165 [main] INFO  namenode.NameNode (NameNode.java:startCommonServices(815))
- NameNode RPC up at: localhost/127.0.0.1:46808

2018-09-25 06:00:28,251 [IPC Server listener on 41229] INFO  ipc.Server (Server.java:run(1153))
- IPC Server listener on 41229: starting
2018-09-25 06:00:28,254 [main] INFO  namenode.NameNode (NameNode.java:startCommonServices(815))
- NameNode RPC up at: localhost/127.0.0.1:41229
{noformat}

{noformat}
2018-09-25 06:00:28,293 [Thread-1152] WARN  datanode.DataNode (BPServiceActor.java:retrieveNamespaceInfo(235))
- Problem connecting to server: localhost/127.0.0.1:41229
2018-09-25 06:00:28,293 [Thread-1151] WARN  datanode.DataNode (BPServiceActor.java:retrieveNamespaceInfo(235))
- Problem connecting to server: localhost/127.0.0.1:46808
{noformat}

Analysed the failure logs somehow due to milliseconds gap the DN is not able to connect to
namenode, even though both NNs have started.That is why it gets the connection failure so
it sleeps for additional 5 seconds before retrying to connect. So DN takes 5sec+addl 5 seconds
to report the failed state. Controlling this milliseconds gaps seems beyond our control and
totally machine specific. We are bound to increase the timeout to handle such an unfortunate
encounter. Will upload addendum patch by increasing time out.

> Improve TestDataNodeMultipleRegistrations#testDNWithInvalidStorageWithHA wait
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-13927
>                 URL: https://issues.apache.org/jira/browse/HDFS-13927
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Minor
>             Fix For: 3.2.0
>
>         Attachments: HDFS-13927-01.patch, HDFS-13927-02.patch
>
>
> Remove the explicit wait in the test for failed datanode with exact time required for
the process to confirm the status.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message