hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Travis <hcoy...@ghostar.org>
Subject Re: decommissioning disks on a data node
Date Fri, 17 Oct 2014 01:58:37 GMT
On Thu, Oct 16, 2014 at 7:03 PM, Colin Kincaid Williams <discord@uw.edu>
wrote:

> We have been seeing some of the disks on our cluster having bad blocks,
> and then failing. We are using some dell PERC H700 disk controllers that
> create "virtual devices".
>
>
Are you doing a bunch of single-disk RAID0 devices with the PERC to mimic
JBOD?


> Our hosting manager uses a dell utility which reports "virtual device bad
> blocks". He has suggested that we use the dell tool to remove the "virtual
> device bad blocks", and then re-format the device.
>

Which Dell tool is he using for this?  the OMSA tools?  In practice, if
OMSA is telling you the drive is bad, it's likely already exhausted all the
available reserved blocks that it could use to remap bad blocks and
probably not worth messing with the drive.  Just get Dell to replace it
(assuming your hardware is under warranty or support).


>
>  I'm wondering if we can remove the disks in question from the
> hdfs-site.xml, and restart the datanode , so that we don't re-replicate the
> hadoop blocks on the other disks. Then we would go ahead and work on the
> troubled disk, while the datanode remained up. Finally we would restart the
> datanode again after re-adding the freshly formatted { possibly new } disk.
> This way the data on the remaining disks doesn't get re-replicated.
>
> I don't know too much about the hadoop block system. Will this work ? Is
> it an acceptable strategy for disk maintenance ?
>

The data may still re-replicate from the missing disk within your cluster
if the namenode determines that those blocks are under-replicated.

Unless your cluster is so tight on space that you couldn't handle taking
one disk out for maintenance, the re-replication of blocks from the missing
disk within the cluster should be fine.   You don't need to keep the entire
datanode down through out the entire time you're running tests on the
drive.  The process you laid out is basically how we manage disk
maintenance on our Dells:  stopping the datanode, unmounting the broken
drive, modifying the hdfs-site.xml for that node, and restarting it.

I've automated some of this process with puppet by taking advantage of
ext3/ext4's ability to set a label on the partition that puppet looks for
when configuring mapred-site.xml and hdfs-site.xml.  I talk about it in a
few blog posts from a few years back if you're interested.

  http://www.ghostar.org/2011/03/hadoop-facter-and-the-puppet-marionette/

http://www.ghostar.org/2013/05/using-cobbler-with-a-fast-file-system-creation-snippet-for-kickstart-post-install/


Cheers,
Travis
-- 
Travis Campbell
travis@ghostar.org

Mime
View raw message