hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Byron Wong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7611) TestFileTruncate.testTruncateEditLogLoad times out waiting for Mini HDFS Cluster to start
Date Tue, 20 Jan 2015 20:25:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284351#comment-14284351
] 

Byron Wong commented on HDFS-7611:
----------------------------------

This bug happens only in tests with restarts and happens because blocks from files created
in previous tests are not being deleted when replaying edits logs.
1) I'm still investigating the source of this, but some time while replaying edits, {{DirectoryWithSnapshotFeature$cleanDirectory}}
can decrement an INode's namespace quota to negative. Either the namespace count was overcounting
while cleaning directories or snapshotDiff, or the INode's namespace quota wasn't counted
up properly in the first place.
2) If the INode's namespace quota happens to be -1, the blocks associated with that inode
will not be deleted. When we call {{fsd.removeLastINode(iip)}} in {{FSDirDeleteOp$unprotectedDelete}},
we explicitly check whether its return code is -1. In that case, we skip collecting the blocks
that should be deleted. Notice that in {{FSDirectory$removeLastINode}}, one of the possible
returns is {{return counts.get(Quota.NAMESPACE)}}.
3) Now there are blocks in the blocksMap that shouldn't be there. This will increase the number
of blocks needed to get out of safeMode. The test failure depends on whether the namenode
receives these blocks. If it does, then the namenode will exit safeMode and the test will
suceed.

> TestFileTruncate.testTruncateEditLogLoad times out waiting for Mini HDFS Cluster to start
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-7611
>                 URL: https://issues.apache.org/jira/browse/HDFS-7611
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Byron Wong
>         Attachments: testTruncateEditLogLoad.log
>
>
> I've seen it failing on Jenkins a couple of times. Somehow the cluster is not comming
ready after NN restart.
> Not sure if it is truncate specific, as I've seen same behaviour with other tests that
restart the NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message