hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10426) TestPendingInvalidateBlock failed in trunk
Date Mon, 30 May 2016 05:57:12 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Yiqun Lin updated HDFS-10426:
    Attachment: HDFS-10426.003.patch

Thanks [~iwasakims] for comments.
In order to fix #2, we should wait for replication count of the block reaches to 2 before
I think the method {{DataNodeTestUtils.setHeartbeatsDisabledForTests}} will make sense for
this. We can disable herartbeats for test  here.

ReplicationMonitor kicks in between delete and checking pending deletion blocks count.
I'd like to add a flag for test in {{BlockManager#ReplicationMonitor}} and this will make
few change. But I haven't found the way that we just don't  add any code to control this.
I am glad if you have some good comments for me.

The other intermittent failures seem that the pengdingDeleteBlocks was not deleted completely
when we did the check. We can add some retry chances for that. Attach a new patch.

> TestPendingInvalidateBlock failed in trunk
> ------------------------------------------
>                 Key: HDFS-10426
>                 URL: https://issues.apache.org/jira/browse/HDFS-10426
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>         Attachments: HDFS-10426.001.patch, HDFS-10426.002.patch, HDFS-10426.003.patch
> The test {{TestPendingInvalidateBlock}} failed sometimes. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
> testPendingDeletion(org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock)
 Time elapsed: 7.703 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.junit.Assert.assertEquals(Assert.java:542)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeletion(TestPendingInvalidateBlock.java:92)
> {code}
> It looks that the {{invalidateBlock}} has been removed before we do the check
> {code}
>     // restart NN
>     cluster.restartNameNode(true);
>     dfs.delete(foo, true);
>     Assert.assertEquals(0, cluster.getNamesystem().getBlocksTotal());
>     Assert.assertEquals(REPLICATION, cluster.getNamesystem()
>         .getPendingDeletionBlocks());
>     Assert.assertEquals(REPLICATION,
>         dfs.getPendingDeletionBlocksCount());
> {code}
> And I look into the related configurations. I found the property {{dfs.namenode.replication.interval}}
was just set as 1 second in this test. And after the delay time of {{dfs.namenode.startup.delay.block.deletion.sec}}
and the delete operation was slowly, it will cause this case. We can see the stack info before,
the failed test costs 7.7s more than 5+1 second.
> One way can improve this.
> * Increase the time of {{dfs.namenode.replication.interval}}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message