hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
Date Fri, 11 Oct 2013 20:00:44 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792996#comment-13792996
] 

Tsz Wo (Nicholas), SZE commented on HDFS-5283:
----------------------------------------------

Vinay, thanks for working on this.  Some comments:

The new method added to Namesystem is better to
# pass BlockInfoUnderConstruction,
# call it as isInSnapshot, and
# do not throw IOException.

i.e.
{code}
//Namesystem.java
public boolean isInSnapshot(BlockInfoUnderConstruction block);
{code}
In the implementation in FSNamesystem, it should try-catch the UnresolvedLinkException and
log it as an error since the full path obtained from a file should not have unresolved link.

Second question: Why adding DFSTestUtil.abortStream(..)?  It does not look very useful.

> NN not coming out of startup safemode due to under construction blocks only inside snapshots
also counted in safemode threshhold
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5283
>                 URL: https://issues.apache.org/jira/browse/HDFS-5283
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 3.0.0, 2.1.1-beta
>            Reporter: Vinay
>            Assignee: Vinay
>            Priority: Blocker
>         Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch
>
>
> This is observed in one of our env:
> 1. A MR Job was running which has created some temporary files and was writing to them.
> 2. Snapshot was taken
> 3. And Job was killed and temporary files were deleted.
> 4. Namenode restarted.
> 5. After restart Namenode was in safemode waiting for blocks
> Analysis
> ---------
> 1. Since the snapshot taken also includes the temporary files which were open, and later
original files are deleted.
> 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks
only inside snapshots
> 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message