hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11714) Newly added NN storage directory won't get initialized and cause space exhaustion
Date Mon, 01 May 2017 19:18:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15991339#comment-15991339
] 

Kihwal Lee commented on HDFS-11714:
-----------------------------------

bq. What if a VERSION file already exists in the directory for some reason? Should we at least
print a WARN for further investigation? 
The equivalent code for non-HA case (saveNamespace) also unconditionally overwrites existing
VERSION. The reasoning is, regardless of previous state, now it has the up-to-date checkpoint,
so it should have an accompanying VERSION file.  So it is expected to overwrite if a VERSION
already exists. I don't think we need to do anything here.

bq. On the retention manager, is it the right behavior to skip purging old image files if
VERSION is missing? Should we do a follow-on fix to handle the case where the VERSION file
is lost for some other reasons (mis-operaiton etc.)?
At minimum, it already logs a WARN. What do you think should be done? Report a storage error
by calling {{reportErrorsOnDirectory()}}? This will cause the storage dir to be in the "failed"
list, which will be recovered later online.  The recovery check should be made to check for
existence of VERSION then.


> Newly added NN storage directory won't get initialized and cause space exhaustion
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-11714
>                 URL: https://issues.apache.org/jira/browse/HDFS-11714
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.3
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-11714.trunk.patch, HDFS-11714.v2.branch-2.patch, HDFS-11714.v2.trunk.patch
>
>
> When an empty namenode storage directory is detected on normal NN startup, it may not
be fully initialized. The new directory is still part of "in-service" NNStrage and when a
checkpoint image is uploaded, a copy will also be written there.  However, the retention manager
won't be able to purge old files since it is lacking a VERSION file.  This causes fsimages
to pile up in the directory.  With a big name space, the disk will be filled in the order
of days or weeks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message