hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
Date Tue, 01 Oct 2013 23:14:24 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jing Zhao updated HDFS-5283:
----------------------------

    Attachment: HDFS-5283.000.patch

Thanks for the fix Vinay! 

Your analysis makes sense to me, and I think your patch can fix the file-deletion scenario.
For dir-deletion scenario, instead of changing the current snapshot code (i.e., to convert
all the INodeFileUC under the deleted dir to INodeFIleUCWithSnapshot), I think maybe we can
just check if the full name of the INodeFile retrieved from the blocksMap can still represent
an INode in the current fsdir tree, and if yes, whether the corresponding inode is the same
with the one in blocksMap.

So I tried to provide a patch based on your existing patch, with the extra check mentioned
above and some other small fixes. We can continue working on this patch if you think this
is the correct path.

> NN not coming out of startup safemode due to under construction blocks only inside snapshots
also counted in safemode threshhold
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5283
>                 URL: https://issues.apache.org/jira/browse/HDFS-5283
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 3.0.0, 2.1.1-beta
>            Reporter: Vinay
>            Assignee: Vinay
>            Priority: Blocker
>         Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch
>
>
> This is observed in one of our env:
> 1. A MR Job was running which has created some temporary files and was writing to them.
> 2. Snapshot was taken
> 3. And Job was killed and temporary files were deleted.
> 4. Namenode restarted.
> 5. After restart Namenode was in safemode waiting for blocks
> Analysis
> ---------
> 1. Since the snapshot taken also includes the temporary files which were open, and later
original files are deleted.
> 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks
only inside snapshots
> 3. So safemode threshold count was more and NN did not come out of safemode



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message