hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Corrupt HDFS and salvaging data
Date Fri, 09 May 2008 04:00:34 GMT
Hi,

Update:
It seems fsck reports HDFS is corrupt when a significant-enough number of block replicas is
missing (or something like that).
fsck reported corrupt HDFS after I replaced 1 old DN with 1 new DN.  After I restarted Hadoop
with the old set of DNs, fsck stopped reporting corrupt HDFS and started reporting *healthy*
HDFS.


I'll follow-up with re-balancing question in a separate email.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
> To: core-user@hadoop.apache.org
> Sent: Thursday, May 8, 2008 11:35:01 PM
> Subject: Corrupt HDFS and salvaging data
> 
> Hi,
> 
> I have a case of a corrupt HDFS (according to bin/hadoop fsck) and I'm trying 
> not to lose the precious data in it.  I accidentally run bin/hadoop namenode 
> -format on a *new DN* that I just added to the cluster.  Is it possible for that 
> to corrupt HDFS?  I also had to explicitly kill DN daemons before that, because 
> bin/stop-all.sh didn't stop them for some reason (it always did so before).
> 
> Is there any way to salvage the data?  I have a 4-node cluster with replication 
> factor of 3, though fsck reports lots of under-replicated blocks:
> 
>   ********************************
>   CORRUPT FILES:        3355
>   MISSING BLOCKS:       3462
>   MISSING SIZE:         17708821225 B
>   ********************************
> Minimally replicated blocks:   28802 (89.269775 %)
> Over-replicated blocks:        0 (0.0 %)
> Under-replicated blocks:       17025 (52.76779 %)
> Mis-replicated blocks:         0 (0.0 %)
> Default replication factor:    3
> Average block replication:     1.7750744
> Missing replicas:              17025 (29.727087 %)
> Number of data-nodes:          4
> Number of racks:               1
> 
> 
> The filesystem under path '/' is CORRUPT
> 
> 
> What can one do at this point to save the data?  If I run bin/hadoop fsck -move 
> or -delete will I lose some of the data?  Or will I simply end up with fewer 
> block replicas and will thus have to force re-balancing in order to get back to 
> a "safe" number of replicas?
> 
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


Mime
View raw message