hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1806) TestBlockReport.blockReport_08() and _09() are timing-dependent and likely to fail on fast servers
Date Tue, 12 Apr 2011 01:27:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018665#comment-13018665
] 

Matt Foley commented on HDFS-1806:
----------------------------------

Well, strictly speaking since it is an intermittent failure, we'll actually have to wait and
see :-)

But yes, I see improvement on the same machine and in the same test circumstances where pretty
consistent intermittent failure was previously observed:

HDFS-1295 just passed auto-test without a TestBlockReport failure; whereas it had failed previously,
and indeed it has been a long time since I saw TestBlockReport NOT be on the Hudson failure
list for several HDFS bugs I was interested in.  

Altho this is a single instance, I think we've got rational grounds for believing that we
now understand the probable cause and have addressed it.  Regrettably I can't test the failure
locally -- it doesn't fail on any machine I've tried it on here.  The proof of the pudding
will be seeing if the community stops being bothered by this false positive after it is submitted
to the public code base.



> TestBlockReport.blockReport_08() and _09() are timing-dependent and likely to fail on
fast servers
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1806
>                 URL: https://issues.apache.org/jira/browse/HDFS-1806
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, name-node
>    Affects Versions: 0.22.0
>            Reporter: Matt Foley
>         Attachments: TestBlockReport.java.patch, blockReport_08_failure_log.html
>
>
> Method waitForTempReplica() polls every 100ms during block replication, attempting to
"catch" a datanode in the state of having a TEMPORARY replica.  But examination of a current
Hudson test failure log shows that the replica goes from "start" to "TEMPORARY" to "FINALIZED"
in only 50ms, so of course the poll usually misses it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message