hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Travis <hcoy...@ghostar.org>
Subject Re: decommissioning disks on a data node
Date Fri, 17 Oct 2014 03:35:24 GMT
On Thu, Oct 16, 2014 at 10:01 PM, Colin Kincaid Williams <discord@uw.edu>
wrote:

> For some reason he seems intent on resetting the bad Virtual blocks, and
> giving the drives another shot. From what he told me, nothing is under
> warranty anymore. My first suggestion was to get rid of the disks.
>
> Here's the command:
>
> /opt/dell/srvadmin/bin/omconfig storage vdisk action=clearvdbadblocks
> controller=1 vdisk=$vid
>

Well, the usefulness of this action is going to entirely depend on how
you've actually set up the virtual disks.

If you've set it up so there's only one physical disk in each vdisk
(single-disk RAID0), then the bad "virtual" block is likely going to map to
a real bad block.

If you're doing something where there are multiple disks associated with
each virtual disk (eg, RAID1, RAID10 ... can't remember if RAID5/RAID6 can
exhibit what follows), it's possible for the virtual device to have a bad
block that is actually mapped to a good physical block underneath.  This
can happen, for example, if you had a failing drive in the vdisk and
replaced it, but the controller had remapped the bad virtual block to some
place good.  Replacing the drive with a good one makes the controller think
the bad block is still there.  Dell calls it a punctured stripe (for better
description see
http://lists.us.dell.com/pipermail/linux-poweredge/2010-December/043832.html).
In this case, the fix is clearing the virtual badblock list with the above
command.


> I'm still curious about how hadoop blocks work. I'm assuming that each
> block is stored on one of the many mountpoints, and not divided between
> them. I know there is a tolerated volume failure option in hdfs-site.xml.
>

Correct.  Each HDFS block is actually treated as a file that lives on a
regular filesystem, like ext3 or ext4.   If you did an ls inside one of
your vdisk's, you'd see the raw blocks that the datanode is actually
storing.  You just wouldn't be able to easily tell what file that block was
a part of because it's named with a block id, not the actual file name.


> Then if the operations I laid out are legitimate, specifically removing
> the drive in question and restarting the data node. The advantage being
> less re-replication and less downtime.
>
>
Yup.  It will minimize the actual prolonged outage of the datanode itself.
You'll get a little re-replication while the datanode process is off, but
if you keep that time reasonably short, you should be fine.  When the
datanode process comes back up, it will walk all of it's configured
filesystems determining which blocks it still has on disk and report that
back to the namenode.  Once that happens, re-replication will stop because
the namenode knows where those missing blocks are and no longer treat them
as under-replicated.

Note:  You'll still get some re-replication occurring for the blocks that
lived on the drive you removed.  But it's only a drive's worth of blocks,
not a whole datanode.

Travis
-- 
Travis Campbell
travis@ghostar.org

Mime
View raw message