hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: How to fix a corrupted disk?
Date Thu, 10 Jun 2010 14:04:00 GMT

On Jun 9, 2010, at 10:13 PM, Sean Bigdatafun wrote:

> I have two questions here about a HDFS cell. Suppose the file that I am interested is
stored on 3 datanodes A, B, C. And A suddenly crashed, I understand I can still read my file
because I have two copies available at this moment. But my question is which software module
is responsible to bring A back to running? (is there a watchdog server?)

No, there is not a watchdog.  Each installation is slightly different and (almost) every OS
provides facilities to guarantee a daemon is continually running.  [SMF, launchd, daemontools,
etc.].   In most installations, I suspect wetware is used to bring back dead datanode processes
so that the reason of the crash can be investigated.

> Furthermore, if the disk on server A is totally corrupted (disk failure), what should
I do to bring my file back to 3 replication mode?

Fix the disk on A and restart the datanode process.

When you have more than 3 datanodes, the namenode will automatically replicate any under-replicated
blocks if there is a node that is qualified to do so.  [In other words, if you have a grid
large enough to support topology, the namenode will not violate topology just to replicate
a block.  It is expected that there are enough nodes in enough racks to not cause policy violations.]
View raw message