hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10603) Flaky test org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testWithCheckpoint
Date Sun, 10 Jul 2016 12:20:10 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369617#comment-15369617
] 

Yiqun Lin commented on HDFS-10603:
----------------------------------

Thanks [~yzhangal] for reporting this. I tested this unit test many times in my local env.
Every time the test failed, I found it always costed some time in processReport, the output
 log like these:
{code}
2016-07-10 20:04:56,585 [Block report processor] INFO  blockmanagement.BlockManager (BlockManager.java:processReport(2178))
- Processing first storage report for DS-7cd4704a-096b-4f6d-aa4f-6f7f74798e53 from datanode
154b2810-2bb9-4ec5-bdb3-c8d38988adae
2016-07-10 20:04:56,586 [Block report processor] INFO  BlockStateChange (BlockManager.java:processReport(2229))
- BLOCK* processReport: from storage DS-7cd4704a-096b-4f6d-aa4f-6f7f74798e53 node DatanodeRegistration(127.0.0.1:49884,
datanodeUuid=154b2810-2bb9-4ec5-bdb3-c8d38988adae, infoPort=49886, infoSecurePort=0, ipcPort=49887,
storageInfo=lv=-57;cid=testClusterID;nsid=653559418;c=1468152288554), blocks: 2, hasStaleStorage:
true, processing time: 1 msecs, invalidatedBlocks: 0
2016-07-10 20:04:56,586 [Block report processor] INFO  blockmanagement.BlockManager (BlockManager.java:processReport(2178))
- Processing first storage report for DS-3917a777-191f-448b-be88-2396c5308ab6 from datanode
154b2810-2bb9-4ec5-bdb3-c8d38988adae
2016-07-10 20:04:56,586 [Block report processor] INFO  BlockStateChange (BlockManager.java:processReport(2229))
- BLOCK* processReport: from storage DS-3917a777-191f-448b-be88-2396c5308ab6 node DatanodeRegistration(127.0.0.1:49884,
datanodeUuid=154b2810-2bb9-4ec5-bdb3-c8d38988adae, infoPort=49886, infoSecurePort=0, ipcPort=49887,
storageInfo=lv=-57;cid=testClusterID;nsid=653559418;c=1468152288554), blocks: 2, hasStaleStorage:
false, processing time: 1 msecs, invalidatedBlocks: 0
2016-07-10 20:04:56,587 [DataNode: [[[DISK]file:/Users/lyq/Documents/work-project/hadoop-trunk/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data5/,
[DISK]file:/Users/lyq/Documents/work-project/hadoop-trunk/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data6/]]
 heartbeating to localhost/127.0.0.1:49874] INFO  datanode.DataNode (BPServiceActor.java:blockReport(389))
- Successfully sent block report 0x665205cb9591e953,  containing 2 storage report(s), of which
we sent 2. The reports had 4 total blocks and used 1 RPC(s). This took 1 msec to generate
and 3 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5.
2016-07-10 20:04:56,587 [DataNode: [[[DISK]file:/Users/lyq/Documents/work-project/hadoop-trunk/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data5/,
[DISK]file:/Users/lyq/Documents/work-project/hadoop-trunk/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data6/]]
 heartbeating to localhost/127.0.0.1:49874] INFO  datanode.DataNode (BPOfferService.java:processCommandFromActive(696))
- Got finalize command for block pool BP-1517015184-127.0.0.1-1468152288554
2016-07-10 20:04:57,410 [main] WARN  hdfs.MiniDFSCluster (MiniDFSCluster.java:waitClusterUp(1356))
- Waiting for the Mini HDFS Cluster to start...
2016-07-10 20:04:58,417 [main] WARN  hdfs.MiniDFSCluster (MiniDFSCluster.java:waitClusterUp(1356))
- Waiting for the Mini HDFS Cluster to start...
2016-07-10 20:04:59,422 [main] WARN  hdfs.MiniDFSCluster (MiniDFSCluster.java:waitClusterUp(1356))
- Waiting for the Mini HDFS Cluster to start...
2016-07-10 20:05:00,423 [main] WARN  hdfs.MiniDFSCluster (MiniDFSCluster.java:waitClusterUp(1356))
- Waiting for the Mini HDFS Cluster to start...
2016-07-10 20:05:01,425 [main] WARN  hdfs.MiniDFSCluster (MiniDFSCluster.java:waitClusterUp(1356))
- Waiting for the Mini HDFS Cluster to start...
2016-07-10 20:05:02,430 [main] WARN  hdfs.MiniDFSCluster (MiniDFSCluster.java:waitClusterUp(1356))
- Waiting for the Mini HDFS Cluster to start...
2016-07-10 20:05:03,435 [main] ERROR hdfs.MiniDFSCluster (MiniDFSCluster.java:waitClusterUp(1362))
- Timed out waiting for Mini HDFS Cluster to start
{code}
And then the cluster start timeout. So I added a line {{cluster.triggerBlockReports();}} before
the operation {{cluster.restartNameNode(true);}}, it will speed this. I tested again for this
test many times with my change, the result were all passed.

In addition, the failed test {{TestOpenFilesWithSnapshot#testParentDirWithUCFileDeleteWithSnapShot}}
that showed in HDFS-10512(https://builds.apache.org/job/PreCommit-HDFS-Build/16005/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
and the similar test {{TestOpenFilesWithSnapshot#testUCFileDeleteWithSnapShot}} are both have
this problem. I also did a triggerBlockReports and savenamespace operations to speed the cluster
startup.

Post the patch from me.

> Flaky test org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testWithCheckpoint
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10603
>                 URL: https://issues.apache.org/jira/browse/HDFS-10603
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, namenode
>            Reporter: Yongjun Zhang
>
> Test 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testWithCheckpoint
> may fail intermittently as
> {code}
> -------------------------------------------------------
>  T E S T S
> -------------------------------------------------------
> Running org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
> Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 63.386 sec <<<
FAILURE! - in org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
> testWithCheckpoint(org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot)
 Time elapsed: 15.092 sec  <<< ERROR!
> java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
> 	at org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1363)
> 	at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2041)
> 	at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2011)
> 	at org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testWithCheckpoint(TestOpenFilesWithSnapshot.java:94)
> Results :
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ยป IO Timed out waiting for Min...
> Tests run: 7, Failures: 0, Errors: 1, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message