hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tony Wu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9493) Test o.a.h.hdfs.server.namenode.TestMetaSave fails in trunk
Date Sat, 12 Dec 2015 02:03:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053953#comment-15053953
] 

Tony Wu commented on HDFS-9493:
-------------------------------

Hi [~liuml07], I would like to work on fixing this test.

Did some analysis on the failure by printing out the metasave content. Turns out the metasave
output for the current test contains 2 Datanodes:
{code}
metasave out: 1 files and directories, 0 blocks = 1 total filesystem objects
metasave out: Live Datanodes: 1
metasave out: Dead Datanodes: 1
metasave out: Metasave: Blocks waiting for replication: 0
metasave out: Mis-replicated blocks that have been postponed:
metasave out: Metasave: Blocks being replicated: 0
metasave out: Metasave: Blocks 4 waiting deletion from 2 datanodes.
metasave out: 127.0.0.1:53465
metasave out: LightWeightHashSet(size=2, modification=2, entries.length=16)
metasave out: 127.0.0.1:53469
metasave out: LightWeightHashSet(size=2, modification=2, entries.length=16)
metasave out: Metasave: Number of datanodes: 2
metasave out: 127.0.0.1:53465 IN 998093619200(929.55 GB) 10270(10.03 KB) 0.00% 882663514112(822.04
GB) 0(0 B) 0(0 B) 100.00% 0(0 B) Fri Dec 11 17:48:41 PST 2015
metasave out: 127.0.0.1:53469 IN 998093619200(929.55 GB) 8192(8 KB) 0.00% 882663825408(822.04
GB) 0(0 B) 0(0 B) 100.00% 0(0 B) Fri Dec 11 17:48:26 PST 2015
{code}

This leads me to believe the following wait time was not long enough: 
{code:java}
    // wait for namenode to discover that a datanode is dead
    Thread.sleep(15000);
{code}

After increasing the sleep time to 30 seconds, the test was able to pass consistently.

The invalid bock count shown in {{Block x waiting deletion...}} statement is updated by {{blockManager.removeBlocksAssociatedTo()}},
which is called by {{DatanodeManager#removeDeadDatanode()}}. This only happens at {{HeartbeatManager#heartbeatCheck()}}.
Using sleep may not be the best way to ensure the Datanode is deleted by Namenode.

I will upload a patch with a more robust way of waiting for the Datanode to be removed, instead
of relying on {{Thread.sleep()}}.

> Test o.a.h.hdfs.server.namenode.TestMetaSave fails in trunk
> -----------------------------------------------------------
>
>                 Key: HDFS-9493
>                 URL: https://issues.apache.org/jira/browse/HDFS-9493
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>            Reporter: Mingliang Liu
>
> Tested in both Gentoo Linux and Mac.
> {quote}
> -------------------------------------------------------
>  T E S T S
> -------------------------------------------------------
> Running org.apache.hadoop.hdfs.server.namenode.TestMetaSave
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 34.159 sec <<<
FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestMetaSave
> testMetasaveAfterDelete(org.apache.hadoop.hdfs.server.namenode.TestMetaSave)  Time elapsed:
15.318 sec  <<< FAILURE!
> java.lang.AssertionError: null
> 	at org.junit.Assert.fail(Assert.java:86)
> 	at org.junit.Assert.assertTrue(Assert.java:41)
> 	at org.junit.Assert.assertTrue(Assert.java:52)
> 	at org.apache.hadoop.hdfs.server.namenode.TestMetaSave.testMetasaveAfterDelete(TestMetaSave.java:154)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message