hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11682) TestBalancer#testBalancerWithStripedFile is flaky
Date Thu, 25 May 2017 18:48:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025201#comment-16025201
] 

Manoj Govindassamy commented on HDFS-11682:
-------------------------------------------

[~eddyxu],

{noformat}
  static void waitForBalancer(long totalUsedSpace, long totalCapacity,
      ClientProtocol client, MiniDFSCluster cluster, BalancerParameters p,
      int expectedExcludedNodes) throws IOException, TimeoutException {

    do {
      DatanodeInfo[] datanodeReport =
          client.getDatanodeReport(DatanodeReportType.ALL);
     .. ..
      }
      assertEquals(expectedExcludedNodes,actualExcludedNodeCount);
    } while (!balanced);
{noformat}

waitForBalancer() is already doing getDatanodeReport() in a loop and tries for 40seconds.
When balancer moves around blocks, NameNode should be aware of the newer block allocations
happening in other Datanodes and in this case without a resrtart, NN shouldn't be needing
a block report to know DNs latest state. If NameNode not having the latest details about the
Datanodes is the problem, how about triggering block report from all the Datanodes explicitly
before/in the loop? Not sure if increasing the the wait time from 40seconds to 2 minutes will
solve the problem as the default block report interval is 5hr and the test is not overriding
it. Please explain if I got it wrong. 

> TestBalancer#testBalancerWithStripedFile is flaky
> -------------------------------------------------
>
>                 Key: HDFS-11682
>                 URL: https://issues.apache.org/jira/browse/HDFS-11682
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Andrew Wang
>            Assignee: Lei (Eddy) Xu
>         Attachments: HDFS-11682.00.patch, HDFS-11682.01.patch, IndexOutOfBoundsException.log,
timeout.log
>
>
> Saw this fail in two different ways on a precommit run, but pass locally.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message