hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
Date Fri, 04 Oct 2013 01:57:43 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785809#comment-13785809
] 

Jing Zhao commented on HDFS-5283:
---------------------------------

bq. We need to have some reference which tells that the BlockCollection resides inside the
snapshot if we are not able to find outside. I think In directory delete case with snapshot,
changing the Inode types recursively is necessary. This keeps the behaviour of both cases
( file deletion and directory deletion ) in consistent.

So in our current solution, for each BlockCollection (which is an INodeUC) in the blocksMap,
we first check if it's in the current fsdir tree. Here our claim is, if the inode is not in
the current tree (i.e., we cannot identify the node's absolute full path or the node with
the absolute full path in the current fsdir tree is actually not the node stored in the blocksMap),
this inode should be a file only existing in snapshot, no matter this node is instance of
INodeUCWithSnapshot or not. If this claim stands, to convert an INodeUC to an INodeUCWithSnapshot
during deletion will be unnecessary.

bq. storedBlock.addNode(node);

Without this the new test will fail when setting DN number to a >1 value. Currently when
NN receives the first block report from a DN, for each block in the report, it will check
its total number of available replica, and if the number is EQUAL to the minimum required
replica number, it increases the blockSafe value by 1 in the safemodeInfo. Thus here when
we call "namesystem.incrementSafeBlockCount(numOfReplicas)", the numOfReplicas must be the
current available replica's number. Otherwise we will miss the EQUAL case and fail to increase
the blockSafe number. 

> NN not coming out of startup safemode due to under construction blocks only inside snapshots
also counted in safemode threshhold
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5283
>                 URL: https://issues.apache.org/jira/browse/HDFS-5283
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 3.0.0, 2.1.1-beta
>            Reporter: Vinay
>            Assignee: Vinay
>            Priority: Blocker
>         Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch
>
>
> This is observed in one of our env:
> 1. A MR Job was running which has created some temporary files and was writing to them.
> 2. Snapshot was taken
> 3. And Job was killed and temporary files were deleted.
> 4. Namenode restarted.
> 5. After restart Namenode was in safemode waiting for blocks
> Analysis
> ---------
> 1. Since the snapshot taken also includes the temporary files which were open, and later
original files are deleted.
> 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks
only inside snapshots
> 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message