hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7886) TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes
Date Sat, 07 Mar 2015 01:54:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351313#comment-14351313
] 

Konstantin Shvachko commented on HDFS-7886:
-------------------------------------------

More things that we've been looking at with Plamen. 
5. The race condition is in {{FsDatasetImpl.getBlockReports()}}, which collects the references
to replicas under {{synchronizes}} section, but then constructs {{BlockListAsLongs}} outside
of it. So if the recovery is triggered between them, then a replica can change its state.
Here it changes from RUR to FINALIZED.
6. {{testTruncateWithDataNodesRestartImmediately()}} occasionally fails because block is recovered
only on two DNs. This happens because NN does not know that two DNs were restarted and can
schedule block recovery with a mixture of old (before the restart) and new (after the restart)
locations. If the old location is used then recovery fails, because the DN have been restarted
under a new address. {{waitActive()}} doesn't help here. We should somehow check that all
new DNs have been registered and sent block reports.

> TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes
> ------------------------------------------------------------------------
>
>                 Key: HDFS-7886
>                 URL: https://issues.apache.org/jira/browse/HDFS-7886
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.7.0
>            Reporter: Yi Liu
>            Assignee: Plamen Jeliazkov
>            Priority: Minor
>         Attachments: HDFS-7886.patch
>
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/9730//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message