hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Collins <...@cloudera.com>
Subject Re: DFS Recovery with fsck
Date Tue, 19 Jan 2010 18:41:24 GMT
On Mon, Jan 18, 2010 at 5:55 AM, Day, Phil <philip.day@hp.com> wrote:
> Hi All,
>
> Can anyone help me with the following please:
>
> I have a 20.1 cluster where I've been doing some testing on recovering from various namenode
failure scenarios.
>
> The current problem I've managed to create is where some directories and the files within
them were deleted, the cluster then stopped, and the edits file lost.
>
> On restart dfs stays in safemode as there are blocks missing (the image knows about the
directories, the datanodes don't have the blocks for them).   Fsck correctly identifies the
missing blocks.
>
> I then take dfs out of safe mode and run "fsck -delete" (to get rid of the corrupt files).
  After that a further fsck run reports the filesystem as health (and a ls shows the directories
as empty).
>
> However if I now stop the cluster and restart it, it comes back into the same state.
  It's as if the results of the "fsck -delete" aren't persisted.
>
> Any thought on what's happening, and what I need to do to tidy up, would be very welcome.

Hey Phil,

Just to confirm, after restarting after the fsck -delete you can
hadoop fs -ls the directory you deleted and the files have returned?
It sounds like the edits generated by fsck -delete are getting lost.

When you shutdown after the fsck -delete are all copes of the fsimage
and edits the same (assuming you've got multiple fs.name.dirs)?

Seeing the namenode edit log before/during the fsck -delete and on
restart would be helpful.

Thanks,
Eli

Mime
View raw message