hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From C G <parallel...@yahoo.com>
Subject Re: HDFS corrupt...how to proceed?
Date Mon, 12 May 2008 18:40:57 GMT
Thanks to everyone who responded.   Things are back on the air now - all the replication issues
seem to have gone away.  I am wading through a detailed fsck output now looking for specific
problems on a file-by-file basis.
  Just in case anybody is interested, we mirror our master nodes using DRBD.  It performed
very well in this first "real world" test.  If there is interest I can write up how we protect
our master nodes in more detail and share w/the community.
  C G

Ted Dunning <tdunning@veoh.com> wrote:

You don't need to correct over-replicated files.

The under-replicated files should cure themselves, but there is a problem on
old versions where that doesn't happen quite right.

You can use hadoop fsck / to get a list of the files that are broken and
there are options to copy what remains of them to lost+found or to delete

Other than that, things should correct themselves fairly quickly.

On 5/11/08 8:23 PM, "C G" 

> Hi All:
> We had a primary node failure over the weekend. When we brought the node
> back up and I ran Hadoop fsck, I see the file system is corrupt. I'm unsure
> how best to proceed. Any advice is greatly appreciated. If I've missed a
> Wiki page or documentation somewhere please feel free to tell me to RTFM and
> let me know where to look.
> Specific question: how to clear under and over replicated files? Is the
> correct procedure to copy the file locally, delete from HDFS, and then copy
> back to HDFS?
> The fsck output is long, but the final summary is:
> Total size: 4899680097382 B
> Total blocks: 994252 (avg. block size 4928006 B)
> Total dirs: 47404
> Total files: 952070
> ********************************
> MISSING SIZE: 1501009630 B
> ********************************
> Over-replicated blocks: 1 (1.0057812E-4 %)
> Under-replicated blocks: 14958 (1.5044476 %)
> Target replication factor: 3
> Real replication factor: 2.9849212
> The filesystem under path '/' is CORRUPT
> ---------------------------------
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it
> now.

Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message