hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lohit <lohit...@yahoo.com>
Subject Re: Corrupt HDFS and salvaging data
Date Fri, 09 May 2008 05:33:56 GMT
Hi Otis,

Namenode has location information about all replicas of a block. When you run fsck, namenode
checks for those replicas. If all replicas are missing, then fsck reports the block as missing.
Otherwise they are added to under replicated blocks. If you specify -move or -delete option
along with fsck, files with such missing blocks are moved to /lost+found or deleted depending
on the option. 
At what point did you run the fsck command, was it after the datanodes were stopped? When
you run namenode -format it would delete directories specified in dfs.name.dir. If directory
exists it would ask for confirmation. 

Thanks,
Lohit

----- Original Message ----
From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
To: core-user@hadoop.apache.org
Sent: Thursday, May 8, 2008 9:00:34 PM
Subject: Re: Corrupt HDFS and salvaging data

Hi,

Update:
It seems fsck reports HDFS is corrupt when a significant-enough number of block replicas is
missing (or something like that).
fsck reported corrupt HDFS after I replaced 1 old DN with 1 new DN.  After I restarted Hadoop
with the old set of DNs, fsck stopped reporting corrupt HDFS and started reporting *healthy*
HDFS.


I'll follow-up with re-balancing question in a separate email.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
> To: core-user@hadoop.apache.org
> Sent: Thursday, May 8, 2008 11:35:01 PM
> Subject: Corrupt HDFS and salvaging data
> 
> Hi,
> 
> I have a case of a corrupt HDFS (according to bin/hadoop fsck) and I'm trying 
> not to lose the precious data in it.  I accidentally run bin/hadoop namenode 
> -format on a *new DN* that I just added to the cluster.  Is it possible for that 
> to corrupt HDFS?  I also had to explicitly kill DN daemons before that, because 
> bin/stop-all.sh didn't stop them for some reason (it always did so before).
> 
> Is there any way to salvage the data?  I have a 4-node cluster with replication 
> factor of 3, though fsck reports lots of under-replicated blocks:
> 
>   ********************************
>   CORRUPT FILES:        3355
>   MISSING BLOCKS:       3462
>   MISSING SIZE:         17708821225 B
>   ********************************
> Minimally replicated blocks:   28802 (89.269775 %)
> Over-replicated blocks:        0 (0.0 %)
> Under-replicated blocks:       17025 (52.76779 %)
> Mis-replicated blocks:         0 (0.0 %)
> Default replication factor:    3
> Average block replication:     1.7750744
> Missing replicas:              17025 (29.727087 %)
> Number of data-nodes:          4
> Number of racks:               1
> 
> 
> The filesystem under path '/' is CORRUPT
> 
> 
> What can one do at this point to save the data?  If I run bin/hadoop fsck -move 
> or -delete will I lose some of the data?  Or will I simply end up with fewer 
> block replicas and will thus have to force re-balancing in order to get back to 
> a "safe" number of replicas?
> 
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Mime
View raw message