hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-6186) Pause deletion of blocks when the namenode starts up
Date Mon, 12 May 2014 22:02:15 GMT

     [ https://issues.apache.org/jira/browse/HDFS-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jing Zhao updated HDFS-6186:
----------------------------

    Attachment: HDFS-6186.002.patch

Thanks for the comments, Ming and Suresh!

After an offline discussion with [~sureshms], looks like a simplified mechanism here can be:
to delay the block deletion for a specific period of time after NN startup, no matter the
invalid block is caused by normal file deletion or unknown block in block report.

Update the patch to implement the idea. The patch adds a new configuration property "dfs.block.pending.invalidation.ms"
to control the delaying period, and simply handles all the delaying logic in InvalidateBlocks.

> Pause deletion of blocks when the namenode starts up
> ----------------------------------------------------
>
>                 Key: HDFS-6186
>                 URL: https://issues.apache.org/jira/browse/HDFS-6186
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Suresh Srinivas
>            Assignee: Jing Zhao
>         Attachments: HDFS-6186.000.patch, HDFS-6186.002.patch
>
>
> HDFS namenode can delete blocks very quickly, given the deletion happens as a parallel
operation spread across many datanodes. One of the frequent anxieties I see is that a lot
of data can be deleted very quickly, when a cluster is brought up, especially when one of
the storage directories has failed and namenode metadata was copied from another storage.
Copying wrong metadata would results in some of the newer files (if old metadata was copied)
being deleted along with their blocks. 
> HDFS-5986 now captures the number of pending deletion block on namenode webUI and JMX.
I propose pausing deletion of blocks for a configured period of time (default 1 hour?) after
namenode comes out of safemode. This will give enough time for the administrator to notice
large number of pending deletion blocks and take corrective action.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message