hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alfonso Olias Sanz" <alfonso.olias.s...@gmail.com>
Subject [RFC] Replication Level policy improvement
Date Wed, 02 Apr 2008 14:01:42 GMT
Hi Hadoopers,

I opened a discussion on the core-users list about the replication
level.  Whenever a data node is dead, all the blocks (files) contained
in that node can be considered as lost??

And if that node never gets back again or at least it takes a while
(long long time) till is ready again.  Some files can get their
replication level compromised.

Shouldn't exist a daemon or being part of the name node server's
responsibilities to recover from that failure.

My point is that whenever a data node is gone a replication process
should be started in order to restore the replication level to all
those files which have lost 1 replica.

Then the replication level would be guaranteed.

If the fault node is back again during the recovery process. It should
not be considered as part of the data nodes group until this process
is over. Then the file system would add the data node and free all the
contained blocks in that data node.   Or enable the node to join and
delete only the files who has been modified during the time that node
was down and delete also the files which have been already replicated.
 It would save time and bandwidth, but the process would be more
complex.

Cheers
Alfonso

Mime
View raw message