hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10748) TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout
Date Wed, 17 Aug 2016 03:10:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423791#comment-15423791

Yiqun Lin commented on HDFS-10748:

Thanks [~xyao] for reporting this issue.
It seemed HDFS-7886 was not completely fix this issue. Can see the comment in HDFS-7930(https://issues.apache.org/jira/browse/HDFS-7930?focusedCommentId=14368053&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14368053).
Although this will not fix the testTruncateWithDataNodesRestart() completely. The location
is correctly invalidated on the NN, but then NN postpones invalidation on the DN and waits
for the next report.
If I add triggerBlockReports() before waitReplication() then the test passes, as it finally
triggers deletion of the replica on the DN.
I think the main problem is that the block report is not completely sended to the namenode,
then lead the cluster wait for the replication.

I tested {{testTruncateWithDataNodesRestart}} in my local env, it will fails one time when
I runs that test 3~5 times. But when I try the way as the comment mentioned, the result are
all passed. I think the operation {{triggerBlockReports()}} would be make sense to this jira.

Attach a simple patch for this.

> TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout
> ------------------------------------------------------------------------
>                 Key: HDFS-10748
>                 URL: https://issues.apache.org/jira/browse/HDFS-10748
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>            Reporter: Xiaoyu Yao
> This was fixed by HDFS-7886. But some recent [Jenkins Results|https://builds.apache.org/job/PreCommit-HDFS-Build/16390/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
started seeing this again: 
> {code}
> Tests run: 18, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 172.025 sec <<<
FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFileTruncate
> testTruncateWithDataNodesRestart(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate)
 Time elapsed: 43.861 sec  <<< ERROR!
> java.util.concurrent.TimeoutException: Timed out waiting for /test/testTruncateWithDataNodesRestart
to reach 3 replicas
> 	at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:751)
> 	at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestart(TestFileTruncate.java:704)
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message