hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang Xie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6624) TestBlockTokenWithDFS#testEnd2End fails sometimes
Date Thu, 03 Jul 2014 08:26:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051183#comment-14051183
] 

Liang Xie commented on HDFS-6624:
---------------------------------

It should be related with the chooseExcessReplicates behavior.
>From log:
{code}
2014-07-02 21:23:30,083 INFO  balancer.Balancer (Balancer.java:logNodes(969)) - 0 over-utilized:
[]
2014-07-02 21:23:30,083 INFO  balancer.Balancer (Balancer.java:logNodes(969)) - 1 above-average:
[Source[127.0.0.1:55922, utilization=28.0]]
2014-07-02 21:23:30,083 INFO  balancer.Balancer (Balancer.java:logNodes(969)) - 1 below-average:
[BalancerDatanode[127.0.0.1:57889, utilization=16.0]]
2014-07-02 21:23:30,083 INFO  balancer.Balancer (Balancer.java:logNodes(969)) - 0 underutilized:
[]
The cluster is balanced. Exiting...
{code}
and 
{code}
2014-07-02 21:23:35,413 INFO  hdfs.TestBalancer (TestBalancer.java:runBalancer(381)) - Rebalancing
with default ctor.    <--- will call waitForBalancer immediately then.
{code}
we can know that before it should be balanced already. because avgUtilization=0.2, so 0.28
- 0.2 < BALANCE_ALLOWED_VARIANCE which is 0.11, | 0.16 - 0.2 | < BALANCE_ALLOWED_VARIANCE.
but once gone to waitForBalancer, even retry getting a new DN report many times, the new added
DN had a small nodeUtilization:0.08, since |0.08 - 02| > 0.11, so after retry then timeout
then failed...

>From those log we know that node removed a couple of blocks after balancing:
{code}
2014-07-02 21:23:30,136 INFO  BlockStateChange (BlockManager.java:addToInvalidates(1074))
- BLOCK* addToInvalidates: blk_1073741840_1016 127.0.0.1:55922 127.0.0.1:57889 
2014-07-02 21:23:30,136 INFO  BlockStateChange (BlockManager.java:addToInvalidates(1074))
- BLOCK* addToInvalidates: blk_1073741841_1017 127.0.0.1:57889 
2014-07-02 21:23:30,136 INFO  BlockStateChange (BlockManager.java:addToInvalidates(1074))
- BLOCK* addToInvalidates: blk_1073741842_1018 127.0.0.1:57889 
2014-07-02 21:23:34,305 INFO  BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3262))
- BLOCK* BlockManager: ask 127.0.0.1:57889 to delete [blk_1073741840_1016, blk_1073741841_1017,
blk_1073741842_1018]
{code}

so the root cause is after balancing, the added block ops will trigger excessReplicates checking,
then the removing will change the used space statistic, then failed the testing.

Is there any quick setting for testing could bypass that checking? :)

> TestBlockTokenWithDFS#testEnd2End fails sometimes
> -------------------------------------------------
>
>                 Key: HDFS-6624
>                 URL: https://issues.apache.org/jira/browse/HDFS-6624
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Andrew Wang
>         Attachments: PreCommit-HDFS-Build #7274 test - testEnd2End [Jenkins].html
>
>
> On a recent test-patch.sh run, saw this error which did not repro locally:
> {noformat}
> Error Message
> Rebalancing expected avg utilization to become 0.2, but on datanode 127.0.0.1:57889 it
remains at 0.08 after more than 40000 msec.
> Stacktrace
> java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become
0.2, but on datanode 127.0.0.1:57889 it remains at 0.08 after more than 40000 msec.
> 	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284)
> 	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancer(TestBalancer.java:382)
> 	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:359)
> 	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:403)
> 	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.integrationTest(TestBalancer.java:416)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.testEnd2End(TestBlockTokenWithDFS.java:588)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message