hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10426) TestPendingInvalidateBlock failed in trunk
Date Wed, 28 Sep 2016 01:52:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528082#comment-15528082

Yiqun Lin commented on HDFS-10426:

Thanks [~liuml07] for pointing this. It seem that {{TestPending InvalidateBlock#testPendingDeletion}}
still failed sometimes(In HDFS-10915, it also appeared). It seems that blockManager still
schedules the invalidate blocks even though we have already made the method {{getInvalidationDelay}}
return 1 indicates that we don't want to delete blocks right now. I'm not sure if there is
some race here. Can we delay the deletion operation, and skip the current loop in ReplicationMonitor.
In the next loop, I think the mockito method will make sense. Ping [~iwasakims] for the comments.

> TestPendingInvalidateBlock failed in trunk
> ------------------------------------------
>                 Key: HDFS-10426
>                 URL: https://issues.apache.org/jira/browse/HDFS-10426
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>             Fix For: 2.8.0, 3.0.0-alpha2
>         Attachments: HDFS-10426.001.patch, HDFS-10426.002.patch, HDFS-10426.003.patch,
HDFS-10426.004.patch, HDFS-10426.005.patch, HDFS-10426.006.patch
> The test {{TestPendingInvalidateBlock}} failed sometimes. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
> testPendingDeletion(org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock)
 Time elapsed: 7.703 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.junit.Assert.assertEquals(Assert.java:542)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeletion(TestPendingInvalidateBlock.java:92)
> {code}
> It looks that the {{invalidateBlock}} has been removed before we do the check
> {code}
>     // restart NN
>     cluster.restartNameNode(true);
>     dfs.delete(foo, true);
>     Assert.assertEquals(0, cluster.getNamesystem().getBlocksTotal());
>     Assert.assertEquals(REPLICATION, cluster.getNamesystem()
>         .getPendingDeletionBlocks());
>     Assert.assertEquals(REPLICATION,
>         dfs.getPendingDeletionBlocksCount());
> {code}
> And I look into the related configurations. I found the property {{dfs.namenode.replication.interval}}
was just set as 1 second in this test. And after the delay time of {{dfs.namenode.startup.delay.block.deletion.sec}}
and the delete operation was slowly, it will cause this case. We can see the stack info before,
the failed test costs 7.7s more than 5+1 second.
> One way can improve this.
> * Increase the time of {{dfs.namenode.replication.interval}}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message