hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From C G <parallel...@yahoo.com>
Subject Re: When is HDFS really "corrupt"...(and can I upgrade a corrupt FS?)
Date Thu, 15 May 2008 20:46:54 GMT
Lohit:  Awesome, thanks very much.  I deleted that file (and the other spurious task files
laying around) and the file system is now HEALTHY.
  I really appreciate the help!
  C G 

lohit <lohit_bv@yahoo.com> wrote:
  Yes, that file is temp file used by one of your reducer. That is a file which was opened,
but never closed hence namenode does not know location information of last block of such files.
In hadoop-0.18 we have an option to filter of files which are open and do not consider them
part contributing to filesystem as being CORRUPT. 

----- Original Message ----
From: C G 

To: core-user@hadoop.apache.org
Sent: Thursday, May 15, 2008 12:51:55 PM
Subject: Re: When is HDFS really "corrupt"...(and can I upgrade a corrupt FS?)

I hadn't considered looking for the word MISSING...thanks for the heads-up. I did a search
and found the following:

/output/ae/_task_200803191317_9183_r_000008_1/part-00008 0, 390547 block(s): MISSING 1 blocks
of total size 0 B
0. -7099420740240431420 len=0 MISSING!

That's the only one found. Is it safe/sufficient to simply delete this file? 

There were MR jobs active when the master failed...it wasn't a clean shutdown by any means.
I surmise this file is remnant from an active job.

C G 

Lohit wrote:
Filesystem is considered corrupt if there are any missing blocks. do you see MISSING in your
output? and also we see missing blocks for files not closed yet. When u stopped MR cluster
where there any jobs running? 

On May 15, 2008, at 12:15 PM, C G 

Earlier this week I wrote about a master node crash and our efforts to recover from the crash.
We recovered from the crash and all systems are normal. However, I have a concern about what
fsck is reporting and what it really means for a filesystem to be marked "corrupt."

With the mapred engine shut down, I ran fsck / -files -blocks -locations to inspect the file
system. The output looks clean with the exception of this at the end of the output:

Total size: 5113667836544 B
Total blocks: 1070996 (avg. block size 4774684 B)
Total dirs: 50012
Total files: 1027089
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Target replication factor: 3
Real replication factor: 3.0

The filesystem under path '/' is CORRUPT

In reviewing the fsck output, there are no obvious errors being reported. I see tons of output
like this:

/foo/bar/part-00005 3387058, 6308 block(s): OK
0. 4958936159429948772 len=3387058 repl=3 [,,]

and the only status ever reported is "OK."

So this begs the question about what causes HDFS to declare the FS is "corrupt" and how do
I clear this up?

The second question, assuming that I can't make the "corrupt" state go away, concerns running
an upgrade. If every file in HDFS reports "OK" but the FS reports "corrupt", is it safe to
undertake an upgrade from 0.15.x to 0.16.4 ?

Thanks for any help....

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message