hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11714) Newly added NN storage directory won't get initialized and cause space exhaustion
Date Thu, 27 Apr 2017 21:44:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987756#comment-15987756
] 

Kihwal Lee commented on HDFS-11714:
-----------------------------------

The original design pre-HA was to just create directory structure under the new directory.
Then the inspector reports some directories are new. This causes the namesystem to call {{saveNamespace()}},
which unconditionally writes a VERSION in all storage directories.  This still happens for
non-HA mode.

For HA, the fisrt part still happens, but does not do saveNamespace() automatically.
{noformat}
[main] INFO namenode.FSImage: Storage directory /xxx/hadoop/var/hdfs/namedir1 is not formatted.
[main] INFO namenode.FSImage: Formatting ...
...
WARN namenode.FSImage: Storage directory Storage Directory/xxx/hadoop/var/hdfs/namedir1 contains
no VERSION file. Skipping...
{noformat}
The last line is when a fsimage is searched and being loaded.

When a checkpoint is uploaded, the retention manager fails to delete old files in the directory.
{noformat}
INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_00000001234567890123 size 20000000000000
bytes.
INFO namenode.FSImageTransactionalStorageInspector: No version file in /xxx/hadoop/var/hdfs/namedir1
INFO namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 1234567890122
INFO namenode.NNStorageRetentionManager: Purging old image
{noformat}

> Newly added NN storage directory won't get initialized and cause space exhaustion
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-11714
>                 URL: https://issues.apache.org/jira/browse/HDFS-11714
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.3
>            Reporter: Kihwal Lee
>            Priority: Critical
>
> When an empty namenode storage directory is detected on normal NN startup, it may not
be fully initialized. The new directory is still part of "in-service" NNStrage and when a
checkpoint image is uploaded, a copy will also be written there.  However, the retention manager
won't be able to purge old files since it is lacking a VERSION file.  This causes fsimages
to pile up in the directory.  With a big name space, the disk will be filled in the order
of days or weeks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message