hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hairong Kuang <hair...@yahoo-inc.com>
Subject Re: [RFC] Replication Level policy improvement
Date Wed, 02 Apr 2008 16:36:25 GMT
Currently HDFS guarantees the replication level. When a datanode becomes
dead, HDFS automatically replicates all the blocks in the datanode as long
as any other datanode still has a replica. When the datanode rejoins the
cluster, HDFS removes an excessive replica.

Datanode rejoin has a risk if the datanode has an obsolete instantiation of
a block. So it is recommended that an administrator removes the old data on
the disk before bringing it back to the cluster.


On 4/2/08 7:01 AM, "Alfonso Olias Sanz" <alfonso.olias.sanz@gmail.com>

> Hi Hadoopers,
> I opened a discussion on the core-users list about the replication
> level.  Whenever a data node is dead, all the blocks (files) contained
> in that node can be considered as lost??
> And if that node never gets back again or at least it takes a while
> (long long time) till is ready again.  Some files can get their
> replication level compromised.
> Shouldn't exist a daemon or being part of the name node server's
> responsibilities to recover from that failure.
> My point is that whenever a data node is gone a replication process
> should be started in order to restore the replication level to all
> those files which have lost 1 replica.
> Then the replication level would be guaranteed.
> If the fault node is back again during the recovery process. It should
> not be considered as part of the data nodes group until this process
> is over. Then the file system would add the data node and free all the
> contained blocks in that data node.   Or enable the node to join and
> delete only the files who has been modified during the time that node
> was down and delete also the files which have been already replicated.
>  It would save time and bandwidth, but the process would be more
> complex.
> Cheers
> Alfonso

View raw message