hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uma Maheswara Rao G 72686 <mahesw...@huawei.com>
Subject Re: lost data with 1 failed datanode and replication factor 3 in 6 node cluster
Date Sat, 22 Oct 2011 07:02:28 GMT
----- Original Message -----
From: Ossi <lossil@gmail.com>
Date: Friday, October 21, 2011 2:57 pm
Subject: lost data with 1 failed datanode and replication factor 3 in 6 node cluster
To: common-user@hadoop.apache.org

> hi,
> 
> We managed to lost data when 1 datanode broke down in a cluster of 6
> datanodes with
> replication factor 3.
> 
> As far as I know, that shouldn't happen, since each blocks should 
> have 1
> copy in
> 3 different hosts. So, loosing even 2 nodes should be fine.
> 
> Earlier we did some tests with replication factor 2, but reverted 
> from that:
>   88  2011-10-12 06:46:49 hadoop dfs -setrep -w 2 -R /
>  148  2011-10-12 10:22:09 hadoop dfs -setrep -w 3 -R /
> 
> The lost data was generated after replication factor was set back 
> to 3.
First of all the question is how are you measuring the dataloss?
Any read failure with block missing exceptions?

My guess is that, you are measuring the dataloss by dfsused space. If i am correct, the dfsused
space will be calculated by complete data available DNs. 
So, when one datanode goes down, then dfs used and ramainig also will reduce relatively. This
can not be taken as data loss.
Please correct me, if my understanding is wrong with the question.
> And even if replication factor would have been 2, data shouldn't 
> have been
> lost, right?
> 
> We wonder how that is possible and in what situations that could 
> happen?
> 
> br, Ossi
> 
Regards,
Uma

Mime
View raw message