hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.
Date Tue, 05 Nov 2013 07:00:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813700#comment-13813700
] 

Jing Zhao commented on HDFS-5443:
---------------------------------

bq. for one level of directory and file structure this method works,if directory structure
is large like

Thanks [~sathish.gurram]! I guess the issue is like this:
# If the file is already an INodeFileUnderConstructionWithSnapshot, the current code will
finally call collectBlocksAndClear and remove the 0-sized block.
# If the file is just an INodeFileUC (but not INodeUCWithSnapshot), when we delete its parent
directory or ancestral directory, the current code will do nothing and leave the 0-sized block
there.

So I think we may first want to fix the above issue here. I.e., when we delete a file, we
make sure the 0-sized block always gets deleted (unless it's a rename). I will write some
unit test to verify this and create a separate jira if necessary.

> Namenode can stuck in safemode on restart if it crashes just after addblock logsync and
after taking snapshot for such file.
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5443
>                 URL: https://issues.apache.org/jira/browse/HDFS-5443
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: sathish
>
> This issue is reported by Prakash and Sathish.
> On looking into the issue following things are happening.
> .
> 1) Client added block at NN and just did logsync
>    So, NN has block ID persisted.
> 2)Before returning addblock response to client take a snapshot for root or parent directories
for that file
> 3) Delete parent directory for that file
> 4) Now crash the NN with out responding success to client for that addBlock call
> Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message